15 April 2016

Cog in a Gear

This endeavor started out both as a reaction to my being rejected for my efforts to build Organic Normal Form™ databases at my then immediately former employer, and the fact that Fabian Pascal had gone missing from the innterTubes. I believed then, as now, that Codd had defined the sole data model. Previous, and au courant, data structures are merely that; they're not models of data. Moreover, the NoSql and xml and sundry flatfile offerings were just re-hashes of pre-Codd engines. In particular, the fascination with hierarchy as implemented in xml is just IMS without any engine. And IMS, with its horrid control structures and coding black hole was specifically the motivation to devise the Relational Model. Alas, Armonk wasn't pleased, coming only a couple of years into IMS's release, and allowed one of the IMS crowd to managed SQL development. Dr. Codd was right, damn it, and somebody should stand up and say so!

Over time, my interests wandered back to where I had spent my initial career: stats and quant. Not least because it was clear early on that The Great Recession had been caused by incompetent and corrupt quant. Some insist, still, that the quants were only following orders, and thus not to blame. The fact is, though, that Country Wide's quants had a major role in devising the toxic mortgages that came to be securitized. Within the last week, both Wells and Goldman have admitted fudging the disclosures regarding those securities (but some underwriters, nee quants, created the ratings), so one can argue that it wasn't just quants.

Then along came Watson, and what IBM now calls Cognitive Analytics. It seems, at this juncture, to be a branding effort by IBM, although other sites do appear in search. The Wiki doesn't yet have a page on that term. Go to it. (Oddly, the term was trademark, but is now listed as abandoned?)

What's of interest is that Date's statement is more true than ever:
I believe quite strongly that, if you think about the issue at the appropriate level of abstraction, you're inexorably led to the position that databases must be relational.
-- Chris Date/2009

There's some controversy, still, about the notion of relations in data. The RM, on the one hand, makes relations the province of the data designer to be specified a priori; an order line must have a resolved foreign key to an order table. However, if normality/orthogonality is followed from the start, changes to schema are transparent to existing data and code. The RM/RDBMS is the only data store which provides that flexibility.

The quant (Big Data maven), on the other hand, proposes to discover, in a probabilistic manner, correlations in the data. Whether those correlations are, de jure, relations is the crux. I say, not, but then I've always been a tad rebellious (but a Blue Yankee). All of the other data structures (whether network, or graph, or ...) follow from hierarchy: the structure specifies the one optimal access path through the data. Get the structure wrong, and you're up a creek. The ignorant complain that the RM is confining, not realizing (or refusing) that in the RM, connections among "records" is expressed in data, which is fungible, while in hierarchy the connections are explicit in the tree (not fungible without some off-line effort), which the designer has pre-specified.

Which brings us to Watson and Cognitive Analytics. What's the goal? Near as I can tell, it's needle hunting in haystacks higher than Everest. Those had better be very gold needles. After all, one benefit of the RM is to be able to infer new facts from the specified relations based on data.

Somewhat ironically, at least to me, is the following sentence in the Wiki article:
Are intelligent machines dangerous? How can we ensure that machines behave ethically and that they are used ethically?

As if humans routinely behave ethically! With the rise and vengeance of the 1%, ethics don't matter much. After all, it's just business.

The notion and practice of artificial intelligence, computerized division, is generally accorded to McCarthy and LISP. Prolog came a bit later. Neither has made much of a dent in IT, although Watson is reported to use some bit of Prolog. Will Watson itself? IBM is said to be betting on it. That link, if you don't go there, is from two years ago.

Watson's only real value-add is the ability to observe, then discard, 99.9967853% of text it "sees" in very short time spans. In other words, "Jeopard!" style infotainment and outright entertainment. In a clinical setting, that amounts to ER settings where diagnostic time does matter. The machine is far too expensive to install in hospitals, so the cloud version shared among hundreds of ERs might help. Humans, experts in their fields, know to not bother with the 99.9967853% in the first place. Watson is House on steroids, which is ironic I suppose since the whole point of "House" the TeeVee show was that House the character was entirely drug addled. For day-to-day research, not so much. Collaboration with real humans is more important.

Finally, IBM is co-opting its own history. Thomas Watson, Sr. invented IBM's signature meme, "THINK" at NCR. Externally, that was supposed to tell potential clients that IBMers used their heads for something besides a hat rack. In fact, the purpose internally, was to motivate IBMers to devise ever more clever and lucrative ways to separate clients from their money. Now the meme is "outthink". We'll see. May be Watson will succeed in being a very nice suit of Emperor's New Clothes?

No comments: