29 January 2016

Days of Wine and Roses

Since Data Science definition has become at least as lucrative a job as doing Data Science, I suppose I should do my part. Let's start with science, from the Wiki of course.
To be termed scientific, a method of inquiry is commonly based on empirical or measurable evidence subject to specific principles of reasoning. The Oxford English Dictionary defines the scientific method as "a method or procedure that has characterized natural science since the 17th century, consisting in systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses."

A bit of history. Back in the mid-60s, before I ever got involved, there were IBM and the 7 dwarves. If one designed, built, or sold such machines one was almost always a EE, preferably from a top tier engineering school. Likewise, if one programmed such machines. Programming was mostly in each machine's assembler. Among real science majors, the hierarchy started:
1 - math
2 - physics
3 - electrical engineering
X - everything else in any order, since they don't matter to the first 3

This became a problem, since "doing computers" was the au courant field, much as the web and big data and data science and such are today. The problem was that very few could survive a EE curriculum. The demand for some degree that was "computers" was high, but the number who qualified for the then appropriate degrees was small. So, just as sub-prime and ALT-A mortgages were invented to fulfill the demand for MBS, so was the computer science degree. Not smart enough for EE? Want to do computers? OK, you can write programs, just sign here.

The gag continues. We've seen NoSql created just because the average kiddie koder can't grok simple set theory, so toss out ACID and DRI and such and return to the thrilling days of yesteryear with COBOL and VSAM, only now it's java/PHP/C# and xml. Or whatever file type is hot.

Data science is quite the same. Your average quant wannabe can't grok stat or OR or maths, so let's create a new discipline that's lite on the tech, but heavy on the buzzwords. Thus, data science. Many of the skills attributed to data science used to be carried out by admin assistants: data cleaning, data entry, and other drudge tasks. Now, these are the critical skills of the data scientist.

The nature of science is to discover previously unknown aspects of God's world. Humans don't create scientific artifacts, we find them lying around, often hidden under millennia of ignorance. Priestley didn't invent oxygen, just found it lying around, after some experimental effort to isolate it. And so on for the rest of the periodic table. Einstein found relativity by asking a really simple question: what happens if this tram departs from the clock tower as fast as light? Nothing more than that. The resulting maths are, to be fair, a tad intimidating. And he had help with that bit.

What science, then, is there in data science? What previously unknown aspect of God's world has been found by the efforts of data science? None that have crossed my path. Nor will there be.

Old wine in new bottles. Or, as the child observed, he has no clothes.

No comments: