10 September 2024

Codd's Revenge

Some years ago, this report ran in the NYT, relating the not so spectacular life of IBM Watson. I find it a cautionary tale for those who cleave to the notion that there's some 'post relational' data world. They ain't no such a thing.

This was predicted some time ago
This goldrush is being driven by the hauntingly accurate results AI has delivered in fields like image, audio and video recognition. Yet, at the end of the day, these algorithms are merely correlation machines, sifting through vast piles of numbers to record subtle correlations among inputs without any high order understanding that would allow them to divine causative relationships. In the end, we are building our AI revolution on a correlation house of cards.
It's not widely discussed, but IBM built DB2 on top of mainframe VSAM way back when. Codd was driven to define the RM in the face of IBM's then major database product IMS, which was/is the hierarchical database. It was defined as a way around the network database. If one wished to, one could have built DB2-lite into any COBOL application, since key-indexed files were/are a part of VSAM, and even earlier machines. Codd, essentailly did that. And Watson and AI and what-have-you continues to do that; xml nonsense being an exception, being just a poor man's IMS. For those with faulty memories, Watson (Jeopardy! version) emerged in 2011. That makes it a decade old, a lifetime or two in IT land.

So, what happened?
The company's top management, current and former IBM insiders noted, was dominated until recently by executives with backgrounds in services and sales rather than technology product experts.
"until recently" is gilding the lily just a tad, implying that the probable was specific to Watson. It's not. IBM from Watson, Sr. on down was/is a sales effort; the science and engineering bits are tolerated as begrudged expense. Remember, the IBM/PC which did jerk IT around for some decades, was built almost wholly from bought-in parts; the Suits thought so little of it.
The Watson they built was a room-size supercomputer with thousands of processors running millions of lines of code. Its storage disks were filled with digitized reference works, Wikipedia entries and electronic books. Computing intelligence is a brute force affair, and the hulking machine required 85,000 watts of power. The human brain, by contrast, runs on the equivalent of 20 watts. [my emphasis]
What Watson is: a relational database on super-duper steroids. Or, at least, it ought to be. Imagine if IBM designed the thing to sequencially search all of that text? Of course not. Indexes up the wazhoo. The only real question: is Watson a structural relational database or a correlation engine? Not everyone acknowledges that these are two distinct ways of looking for 'intelligence'. The relational database is grounded in relations, of course. But relations, in Codd's term, is not the PK/FK 'relation' at all but rather the connection of attributes to the defining identity of the entity. IOW, the standalone table. Again, indexed files existed in VSAM very early on. Nearly everyone considers the PK/FK 'relation' the raison d'etre of the RDBMS, in contrast to the hierarchical file systems which preceded. Both systems are grounded in 'relations' specified by the Designer.

Or is a Watson a correlation machine, continually calculating R among gallions of data points?

The former is structural, dictated by the Designer, while the latter is explorational, hidden in the data.

In the end, so far, Watson would be the Health Guru to smarten doctors. Not so much:
Now IBM is paring back Watson Health and reviewing the future of the business. One option being explored, according to a report in The Wall Street Journal, is to sell off Watson Health.
This essay sat in the queue since 2021. Not much has changed, except to increase the AI hype. And to create the WatsonX brand as some vehicle to re-coup all that moolah. They'll end up throwing good money after bad. It will still end badly.

No comments: