Data Science, Artificial Intelligence and Machine Learning: understanding the difference

September 20, 2019

It should be noted that the automobile was also invented by pedestrians. But, somehow, motorists quickly forgot about this.” – Ilya Ilf and Evgeny Petrov, The Golden Calf

Why it is not as simple as it seems

Data Science, Artificial Intelligence and Machine Learning are the terms that are frequently used and often used interchangeably (as I did in my recent post). Yet, there are clear cut differences between these terms, which are useful to understand when applying for jobs or participating in an intelligent discussion. I start with pointing out several common misnomers, and then provide a bit of historical and theoretical background that my own view is based upon.

Let’s not be literal

Saying that Data Science is “a science about collecting and analyzing data” at worst borders on nonsense while at best provides a definition that is too general to be of any practical use. Moreover, it is easy to convince oneself that such a definition is wrong: Indeed, a career scientist with several decades of experience in analyzing, e.g., biological or chemical data is not a data scientist. A quick look at job announcements will show that a “Data scientist” is often required to have very specific software skills that our scientist probably does not possess. In the same time, these job announcements might be very light on the skills that the real scientists consider necessary for working with and analyzing data.

When your only tool is a hammer, every problem looks like a nail

Another common trap is looking only at the aspect of each of these fields that one is most familiar with, e.g., the algorithms used therein. Some of these algorithms are frequently associated with Data Science (e.g., various regression and classification methods), others fall in the domain of Artificial Intelligence (e.g., reinforcement learning), and the more complicated ones are classified as Machine Learning (which is certainly true for neural networks, but not necessarily so for ANOVA or linear regression). Depending on your experience you may be tempted to say that one field is merely a subset of another or discuss the overlap between the two, whereas in reality you will be talking only about one aspect of the whole.

Coming from a different background, I could say that all the three are just a modern variation of the good old statistics, but this would be doing an injustice towards the mathematical and computational developments of the last few decades.

Brevity is the sister of the talent

Finally, there are plenty of long-winded answers to this rather simple question. Indeed, some people believe that giving lots of “simple” examples makes things easier. The problem is that that no one listens to such explanations. Once you said “for example…” you’ve lost your audience (such as a recruiter or an HR person). The answer I provide below may be not the unique one, and not the most generally accepted one, but it is an operational answer, and I will try to summarize it in a few short phrases.

REMARK: I will further occasionally refer to Data Science, Artificial Intelligence and Machine Learning by their initials: DS, AI and ML.

Collecting and analyzing data over centuries

Collecting and analyzing data is probably as old as the human civilization – certainly preceding the emergence of DS, AI and ML as we know them.

Year’s length

A good example of the progress in data collection is the estimate of the length of a year over the centuries. That seasons repeat themselves after approximately twelve moon cycles and that one moon is about 29 days was probably noted by the early humans and possibly also by the Neanderthals. Such an estimate however errs by two-three weeks every year, so it was a great improvement when, in the 3rd millennium BCE, the Babylonian astronomers estimated the year’s length to be 360 days. My guess is that in absence of a natural phenomenon marking a new year (such as sunrise for the new day or new moon for the month) they had to spend a few decades making daily records of seasonal phenomena and then calculating the average. The remaining generations of the Babylonian priests probably continued making the records in order to reduce the estimation error.

Ancient Greeks and Romans improved the estimate by correlating the seasonal phenomena to much more regular cycles of stars and planets, producing the estimate of 365.25 days, which was codified in the Julian Calendar that the Orthodox Church still continues to use (this is why the Russian “October Revolution” took place in November). It took another millennium and a half till the observations with high-precision navigation instruments ushered in another reform, producing the Gregorian Calendar, where the leap year is skipped every four hundred years.

While we still talk about stars, let us mention a well-known data science project carried out by Tycho Brahe and his student Johannes Kepler: Brahe spent good three decades making observations of one and the same supernovae star. His data were later analyzed by Kepler resulting in the Kepler laws of stellar motion.

Back to the grammar school

Today learning about parts of speech and conjugations is a thing for an elementary school or a beginners’ language course for foreigners. Yet, once upon a time these things were unknown. The notions of grammar known to the speakers of the major European languages come from the observations made over centuries by ancient Greeks annd Romans, who took great interest in speaking and writing beautiful prose and poetry.

A more well-defined project was carried between 6th and 10th centuries by the group of Jewish scholars, known as the Massoretes of Tiberia. They analyzed a well-defined corpus of texts written in Hebrew (including the Jewish Bible, Talmud and some others) and produced a complete set of rules governing the grammar and the pronunciation of the Hebrew texts. Their data were so convincing that they even dared to point out the places in the Bible where incorrect grammatical forms were used, although they understandably abstained from correcting the holy text. A similar (and technically much earlier) example of such work was carried out by an Indian poet Pāṇini in the 6th century BCE to codify the grammar of Sanskrit.

As a particular example of a pre-NLP (natural language processing) project let us mention the work of Jacob Grimm (the elder of the Grimm Brothers), who collected and analyzed the data from a few dozen dialects of German and came up with the “Grimm law”, describing how the pronunciation of sounds various sounds evolves over time. This work has served as the basis for the historical linguistics.

Statistics: science about collecting and analyzing data

The real boom of data analysis happened in the XIXth and XXth centuries as a result of the technological revolution and emergence of sciences, such as physics, chemistry, evolutionary biology, etc. Any scientific field prides itself as being grounded in collecting and analyzing data (although this claim is somewhat strained when it comes to social sciences and the computer science). With the time studying and using the scientific methods of analysis became a science in itself, known under the name of *statistics*.

Data collection in statistics is known as *experiment design* or *study design*. The goal of the study design is not simply obtaining and recording data, but doing it in an unbiased manner, i.e. in a way permitting reliable analysis and conclusions.

Data analysis is the mathematically and computationally heavy part. The collected data can be used for *parameter estimation*, such as in our earlier example about estimating the length of a year on the basis of astronomical and seasonal observations. Another frequent application is *hypothesis testing*, where the data is used to confirm or disprove a guess, such as, e.g., a grammatic rule.

Note that the hypothesis itself and the model, a parameter of which is being estimated, are generated by the humans who perform data collection and analysis. Likewise, it is up to the humans to to decide whether the hypothesis was proved, whether the parameter estimation was satisfactory, how the two can be used in future, and what could be done about the bad results. We will refer to this as making *decisions* based in the results of the analysis.

How is data science different from statistics?

The information age has brought two important developments:

First of all, it has made large (or even “huge”) quantities of data readily available for the analysis. Partly, because data collection has become easier, partly because new interesting datasets has arisen, such as the records of the internet traffic, purchases, financial transactions, etc., and partly because the data are now recorded in computer-friendly form (rather than in paper log-books) and therefore easily accessible for the analysis.

The second important development is the increase in pure computational power, which made it possible to use the methods of statistical analysis that were previously considered as impractical (as well as creating new computationally heavy methods).

Heavy reliance on the storage capacity and the compuational power of modern computers is what distinguishes Data Science from conventional statistics. This explains why scientists with experience in collecting and analyzing data often do not cut the mark as data scientists, due to their lack of training in computer science.

Artificial intelligence

Data Science is a quantitative improvement of the two areas of statistical analysis that had already been somewhat routine/technical: data collection and the calculations related to parameter estimation or hypothesis testing. It doesn’t touch the areas that are essentially human activity: formulating the hypothesis to test (or a model the parameters of which are estimated) and making the decisions based on the results of the analysis.

Mimicking the human capacity for hypothesizing and decision-making is what we call Artificial Intelligence. Note that AI is not necessarily built atop the Data Science; in fact, it can rely on very little data and little analysis. For example, I have a Roomba 700 robotic vacuum cleaner – the one that rolls along the floor and changes the direction every time when it hits an obstacle. It is even more sophisticated than that: it adjusts the time of vacuuming to the dirtiness of the floor and, if some spot seems particularly dusty, it will spend more time cleaning this particular spot. Clearly it makes decisions and acts upon them. However, these decisions are based on rather simple inputs from its sensors

(in terms of the quantity of data). Neither does it learn the layout of my apartment, that could help it in the next cleaning session.

Self-driving cars and even the language processing systems are sometimes not much more sophisticated than that, although they do give the impression of being sophisticated to the laymen. For example, one of the earliest language processing systems was built to simply paraphrase the sentences entered by the user. Yet many test users thought it capable of understanding them better than their friends or relatives. For example:

* Computer: How are you?

* User: I am sad.

* Computer: why are you sad?

* User: My cat died.

* Computer: How did it die?

(Indeed, this is the strategy often used by psychologists to make people open up.)

How does the machine learning fits in?

The algorithms used for data analysis in Data Science as well as the decision-making algorithms used in AI form together what is called Machine Learning. Strictly speaking, word “machine” implies the computationally heavy techniques, however one often includes into Machine Learning the basic techniques such as linear regression or PCA. This is partly motivated by the mathematical convenience of treating together the modern methods and those of good old classical statistics.

The short answer

I have tried to present above a coherent view of the interrelations between Data Science, Artificial Intelligence, Machine Learning and traditional Statistics. These interrelations are summarized below, as well as in the following figure.

* Statistics is the discipline about collecting and analyzing data. It leaves to human to make hypotheses and judge the results.

* Statistics becomes Data Science when it deals with huge quantities of data and computationally intensive estimation/testing procedures.

* Artificial Intelligence makes judgements/decisions on the basis of the results of a statistical analysis.

* Machine learning is the collection of the computational procedures used for data analysis in Data Science and for making decisions in the context of Artificial intelligence.

YingYang

Mettre le pied sur le même râteau

September 18, 2019

Наступить на те же грабли [nastoupit’ na tie je grabli]

Mettre le pied sur le même râteau.

Signification: Reproduire la même erreur stupide.

L’expression est basée sur la situation ridicule: quand vous mettez votre pied sur les dents d’un râteau, son manche vous tape dans le front. Il est d’avantage ridicule, si vous le refaite, malgré l’expérience.

Le français nijegorodais

September 3, 2019
Les expressions russes du jour:
Нижегородский французский [niejegorodskiï frantsouzskiï]
Le français nijegorodais
 
et
 
Смесь французского с нижегородским [smies’ frantsouzskogo s nijegorodskim]
Le mélange de français et nijegorodais
 
L’adjectif “nijegorodais” signifie celui provenant de la ville de Nijni Novgorod en Russie (comme moi). Les deux expressions, signifiantes un mélange de langues incompréhensible, sont basées sur la notion que le français est une langue délicate et légère, alors que le dialecte de Nijni Novgorod est lourd et bruyant (on utilise notamment beaucoup de voyelles grandes ouverts comme “O” et “A”).
 

Many faces of machine learning

July 24, 2019

When it comes to machine learning (that I designate below by its initials: ML), there are two kinds of people: the experts in ML and those who don’t understand what all the buzz is about. It took me a while (and a few books) to wrap my mind about it: with 20+ years of scientific research behind me, I am not embarrassed to look stupid when asking simple questions. Yet, the problem is that “the experts” rarely bother to answer such questions, partially because sometimes they do not really understand the question.

Thus, while being still a novice to the field, I dare to offer a piece of wisdom: ML is more than one field. Or, perhaps, it is a field with many subfields. Therefore, if “the experts” give contradictory answers, it is because they are usually the experts only in one of the subfields, with little idea about the other subfields or even unaware that these other subfields exist.

Machine learning as a branch of mathematics

For a mathematician (or a theoretical computer scientist) machine learning is a branch of mathematics. This is where the philosophical questions are asked: what it means “to learn”? How can a computer learn? Can computers become smarter than humans? etc. In short, this is the part of ML that trickles down into the popular culture (of course, it arrives there stripped of the equations and the theorems).

However, the field is not as hot as it’s cracked up to be. Indeed, most of the questions were posed already in the early era of the computers and many basic mathematical results were obtained in the 80s or even earlier (mathematicians will correct me).

My first encounter with ML happened in this realm, when I went to a colloquium given by Vladimir Vapnik, who was visiting the Max Planck Institute for the Intelligent Systems – next door to the institute where I worked at the time. It was his affiliation as “the principal research scientist at Facebook” that attracted me – I expected a kind of a popular presentation on the subject, especially because the colloquium was given not only for the specialists. Instead I was confronted with rows of handwritten equations, starting from “the basic” model that one probably learns during the first PhD year in a computer science department. Later I learned that Vapnik was one of the big names in the field (after reading this book)

Machine learning as statistics

Statisticians and physicists might see machine learning as a bunch of advanced statistical models, no more than that. I cannot really speak for the statisticians, but physicists seem to be the folks that are most dismissive about the whole field. Therefore let me spend a few minutes discussing ML from a physicist’s point of view.

In physics we’ve been doing it for decades

Physics differs from other sciences in that it excels in wrapping massive amounts of statistical data into precise mathematical models. Thus, physics embodies the very ideal of science: the science that analyses the experimental data and develops models capable of predicting the results of future experiments.

In this context some of the statements made by professional data scientists are nothing short of an offense to physicists: e.g. this article (in French) claims that “A statistician
wants to construct a model that establishes a relation between a variable and a result. A data scientist wants more: to predict. The data scientists build on the basis of data the models that can predict the future as precisely as possible.” You may be right to suggest that the person who says that has no real scientific training/experience… the bad news is that this person is one of “the experts” who are doing the hiring for the data science.

On the general level, data science differs from physics in three key aspects:

  • It deals with any kind of data, whereas physicists focus mainly on the physical matter: the particles, the starts, etc. While the physicists are aware that their models and modeling skills can be applied elsewhere, this is not necessarily the case of the hiring managers.
  • Data science deals with much greater amounts of data than what comes out from a typical physics experiment (unless you are working at CERN). Besides, the physicists have long ago split into experimentalists and theoreticians, with the theoreticians dangerously loosing the touch of the reality. Moreover, the physics models have been so precise that physicists stopped bothering themselves with statistics. Indeed, statistics is neglected to the absurd degree, where it is not even taught in physics programs beyond the rudimentary basics. Trained physicists often have no idea about statistical testing, likelihoods and estimators, and their worldview is heavily frequentist. This is in sharp contrast to the physicists’ generally superb background in anything related to probability, which may include multiple courses in probability theory, stochastic calculus, theory of random processes, statistical physics, quantum physics, quantum statistical physics, etc.
  • Data science is essentially computer based, whereas the physicists still essentially do away with a pen and a paper. Of course, the level of computer literacy among the physicists is quite high, but the computer mainly serves to plot the graphs for the very last formula or solve an intractable equation or, at maximum, to model a behavior of a system based on known equations.

Remark: I switched here to using term “data science” rather than “machine learning”. I am aware that data science is not just about machine learning. However, this is the field were machine learning is used the most.

It is just a bunch of statistical models

Really, what’s the big deal about the linear (or even nonlinear) regression and the principal component analysis? These are among the staples of a typical machine learning book/course, yet in physics they are the stuff for an undergraduate – well established mathematical techniques that are hardly new to anyone. Is there more to ML than a few new fitting methods?

As an example, let us consider nonlinear regression using neural nets. For a physicist nonlinear regression means fitting the parameters of an already known curve composed of well-defined mathematical functions. Neural nets allow to fit any curve that may be either impossible decompose into (reasonably relevant) elementary and special functions or which may be simply too difficult to guess. And, unlike the orthogonal expansions (such as Fourier series), neural nets do it with a finite number of parameters.

The following two sections deal with the questions that immediately pop up in a head of a physicist upon reading the previous paragraph.

Fitting an elephant

“With enough parameters I could fit an elephant!” This is a physicists’ way of referring to the problem of overfitting. Indeed, the reliability of the predictions made by the practitioners of ML is probably the weakest point of the field. A couple of years ago I attended the Basel Computational Biology Conference where Prof. Christian Lovis, a distinguished scientist in the field of medical informatics, spent an hour of his plenary talk playing a stand-up comedy of citing multiple examples of ridiculous results produced by sophisticated ML and artificial intelligence algorithms. His point was: the problem is not the algorithms, but how we use them. Here is a more recent well-publicized example – the problem again lies not with the algorithm, but with the data that it was trained upon.

At the origin of the issue is the fact that many people in the field lack background in probability and statistics. Though the machine learning books routinely stress the importance of statistical testing, a few paragraphs here and there are unlikely to replace the training and the intuition that come with the years of doing real research. (Indeed, on occasion “an expert” in ML might be somebody with a BS in computer science and “18 months of experience as a data scientist”.)

In other words: yes, with that many parameters one needs a lot of data and good understanding of statistical aspects of ML. This seems to provide an excellent entry point for physicists.

Machine learning is as a black box

Another obvious complaint comes from the fact that we have no idea what the many parameters inferred by a ML algorithm correspond to. In other words we do not understand what the algorithm is doing, even though it gives us correct predictions.

Here we must ask, what it means to understand something. (After all, we’ve already asked what it means to learn.) If “understanding” means being able to decompose the answer into smaller components, digestible by the human mind, then, indeed, ML means doing things without really understanding them. In fact, this is precisely the point – to extend the range of usable models beyond those that we can easily manipulate in our head or with a pen and a paper.

I see how many people, especially those trained as physicists, will have hard time parting with the “understanding”. From the practical point of view, it has been the definitive method for verifying the correctness of the results for generations of physicists (rather than statistical testing). Moreover, “understanding” the nature is often treated as a worthy intellectual pursuit, justifying the work that is of no direct use to the society. So I leave it at that.

Inserting things “by hand”

Another thing that may irritate physicists is the artifacts used to assure convergence of algorithms, such as “regularization”. In physics we do not insert things “by hand” in order to fix the solution, for the nature already provides an appropriate cutoff: level width/lifetime, lattice constant, speed of light, etc. And even where the natural cutoff seems to be absent, one should look for a deeper meaning – this is how infrared and ultraviolet catastrophes in quantum field theory produced the renormalization group.

This problem is related to both discussed previously: like with overfitting, the cutoffs can be tested and corrected using the appropriate statistical procedures. Whether we will be able to “understand” where these cutoffs come from is an open question.

Machine learning as software development

The practitioners of ML are usually people with background in applied computer science. If I have said above many harsh words in their address, I sincerely hope for their forgiveness. There is a good reason why they dominate the field, for one needs to know how to program and be able to understand and to create algorithms. The lack of proper training in math, statistics and analyzing real data is likely a problem, which most practitioners probably overcome with years of experience and via interaction with more scientifically minded colleagues.

Machine learning as a skill

Finally, there exist many ready-to-use ML software packages: sklearn, PyTorch, TensorFlow, WEKA and so on, and so forth. Nowadays you need not to have a degree in exact sciences or computer science to do ML, and this opens ample opportunities for people coming from various backgrounds. This is especially true, if they do not intend to make a career in data science, but rather use ML in their work, just as one uses a telephone or an Excel spreadsheet. ML is becoming a skill, just as the computer literacy.

This perhaps makes it tricky when you are being interviewed for job, as you need to guess what kind/level of ML skills the potential employer expects, while risking to make the interviewer or yourself look ignorant by posing a precise question.

Remarks

Some ideas expressed here were mentioned in my old post [Has physics become a “classical science?”

Le gros poisson dans un étang peu profond.

June 30, 2019

L’expression russe du jour:
Большая рыба в мелком пруду. [bolchaïa ryba v melkom proudou]
Le gros poisson dans un étang peu profond.

Une personne dont les prétentions à l’importance sont basées sur sa supériorité par rapport au cercle de gens assez limité autour d’elle, alors qu’en réalité cette personne n’est pas aussi exceptionnelle. On peut le dire, par exemple, quand un maire d’une petite ville se comporte comme s’il était le président de la République.

Nous attelons longtemps

June 24, 2019

L’expression russe du jour:
Мы долго запрягаем, да быстро ездим.
[My dolgo zapriagaïem, da bystro ïezdim]
Nous attelons longtemps, mais nous galopons vite.

Longue préparation est nécessaire afin d’assurer des bons résultats. Cet expression sert souvent comme un excuse de procrastination.

Quand tu allait à pied sous la table

June 14, 2019

L’expression russe du jour:
Когда ты пешком под стол ходил. [kogda ty piechkom pod stol hodil]
Quand tu allait à pied sous la table.

C’est à dire: quand tu était encore un petit enfant, qui n’a pas besoin de se plier pour marcher sous une table à manger. Cette formule est utilisée pour souligner son expérience. Par exemple : ” J’étais déjà un chef d’équipe à l’époque où tu encore allait à pied sous la table.”

Sept nounous ont l’enfant sans un oeil

June 11, 2019

L’expression russe du jour:
У семи нянек дитя без глаза. [u semi nianek ditia bez glaza]
Sept nounous ont l’enfant sans un oeil.

Des résultats désastreux produits par une trop grande équipe, alors que la tache est assez simple.

Qui est destiné à être pendu, ne se noiera pas

June 6, 2019

L’expression russe du jour:
Кому суждено быть повешенным, тот не утонет.
[komou soujdieno byt’ poviechennym tot ne outonet]
Qui est destiné à être pendu, ne se noiera pas.

On ne peut pas échapper son destin. Parfois dans le sens qu’il ne faut pas trop se soucier des personnes qui méritent une fin ignoble.

Le radin paye deux fois

June 2, 2019
L’expression russe du jour:
Скупой платит дважды [skoupoï platite dvajdy]
Le radin paye deux fois.

Celui qui cherche toujours des affaires bon marché finit souvent par payer plus.
Contrairement au français qui a 17 voyelles (13 orales et jusqu’à 4 nasales dans certains dialectes), le russe n’en a que six. La seule qui est difficile à prononcer pour un francophone est ы [y] – elle ressemble à la voyelle  и [i] prononcée dans le nez.