Dr. Codd Was Right: Big Dig, Big Data, Big Deal?

Colombia, Venezuela, Greenland, Canada then the rest!!

The world is not linear.
Smart people vote with their feet. [you betcha]
-- Donald Hughes McElhone, Ph.D.[statistics, Iowa State]/1975

Over the past several years, global reinsurance companies have had what the researchers call a "climate epiphany" and have roughly doubled the rates they charge home insurance providers.
-- Claire Brown, et al/2025 [no NCAR?, and NOAA? de Nile is just a river]

Better A.I. would remember what it learns, just as humans do, and squeeze more work from each watt. Tech companies spend billions of dollars running large language models that don't learn while they run.
-- Carl Benedikt Frey/2025 [sounds like AI/LLM backed by some RDBMS which stores the learning? sound familiar?]

There's not a single example of things working out for the appeaser.
-- Nicolle Wallace/2024 [like this? the next extortion is on the way]

He was awarded a Bronze Star and a Combat Infantryman Badge (CIB), having served in civil-affairs operations and as an adviser in Afghanistan. But let's be clear: Hegseth was not a front-line leader of combat troops under sustained fire. He did not command an infantry company in protracted combat or lead exhausted soldiers through night patrols and firefights.
-- Dick Dowdell/2025 [once again, with anger: drugstore truck driving man]

Effective with the 2026 mid-term elections, military proctors will be stationed at every shithole Blue city polling place, demanding to see a current, valid United States of Alabama passport. No passport, no vote. I am the dictator.
-- Donald J. Trump by Executive Order, this day 7 December 2025 [let's not wait that long]

I have had to explain and re-explain and re-explain and re-explain, you know, how relational databases work, what is an eigenvector, what is dimensionality reduction.

-- Christopher Wylie/2018

I believe quite strongly that, if you think about the issue at the appropriate level of abstraction, you're inexorably led to the position that databases must be relational.

-- Chris Date/2009

This Week's thought

Multi-omics plus ML can generate targets and hypotheses faster, but it does not magic away causal biology, tolerability, endpoints, or the sheer ugliness of neurodegeneration.
-- shakeel hoosdally/2025 [another AI flameout. not gonna be the last]

See you next week in a brand new show^{©Heckle and Jeckle}

Therefore:

In a time of SSD, multi-core/processor, two terabyte memory and Optane App Direct Mode (RIP) machines, there is no reason not to build from BCNF data. Time to do what Dr. Codd demonstrated. Technology has finally caught up with the maths.

09 January 2014

Big Dig, Big Data, Big Deal?

Among the largest old city rehab efforts in the history of the country was The Big Dig in Boston. It finally finished, late and over budget. But it includes one of the prettiest bridges on this side of The Pond. Why is it that any European country manages to do civil engineering with greater beauty in its homeliest structures than the USofA does in its best? Why is it that virtually every "innovation" in automobiles since Henry Ford was created by some European company? Just asking.

Recently, this endeavor mused on the Death of Big Data. Or, perhaps, high morbidity. Watson has been getting ink recently on blogs, so it's not a surprise that IBM would take the opportunity to discuss the machine. And from what I can't tell is whether Watson is sui generis, or a model shippable in quantity. From the wiki description, it's built from off the shelf parts. Except, of course, for the software. What's even more interesting: Watson doesn't make it to the top 500 of supercomputers, and appears to be I/O bound by *hard drives*:

According to John Rennie, Watson can process 500 gigabytes, the equivalent of a million books, per second. IBM's master inventor and senior consultant Tony Pearson estimated Watson's hardware cost at about $3 million. Its performance stands at 80 TeraFLOPs which is unfortunately not enough to place it at Top 500 Supercomputers list. According to Rennie, the content was stored in Watson's RAM for the game because data stored on hard drives are too slow to access.

I guess these smart folks never heard of SSD!!!

What's even more interesting: some of that software is Prolog:

We required a language in which we could conveniently express pattern matching rules over the parse trees and other annotations (such as named entity recognition results), and a technology that could execute these rules very efficiently. We found that Prolog was the ideal choice for the language due to its simplicity and expressiveness.

Some background, some my own, on Prolog.
- it was created within months of Codd's relational model paper
- it uses what amounts to being a normalized, in memory, database. most Prologs refer to this data as "database".
- while at OMS in the early 90s, a couple of my colleagues attempted to build an AI sub-system for the main product, medical pre-qualification, in its database/4GL (Progress). never got very far, if only because Progress has never been particularly relational or normal in application
- while at CSC I had to endure a Prolog mutant called GraphTalk, which CSC had bought up a few years before from France. of course. my colleagues at CSC turned this mutant into COBOL/VSAM coding. Yum.
- one of the current uses of Watson is in medical diagnosis. hmm. twenty years too soon was I.

There's still a commercial version of Prolog/datastore called Amzi! (yes, the ! is part of the name just as Yahoo!). And guess what? It's major market is business rule and decision support implementations. As it happens, Prolog syntax/semantics is more alien to C inculcated coders than even R or SQL. But, according to its zealots, Prolog systems are orders of magnitude more compact than imperative (e.g., java/C/FORTRAN) equivalents. Kind of like what relational zealots say about RM/SQL databases versus flat files.

So, today's Times has a puff piece from IBM on the use and future of Watson. As others have concluded, but with suspicion, IBM sees Watson as central to its commercial success.

IBM's elevation of Watson is the biggest illustration yet of the technology industry's faith that so-called Big Data holds promise for the economy -- and the failure so far to meet that promise.

Big data is just descriptive statistics, since one has all the numbers. Look at any Baby Stat book, and an early chapter (and likely the shortest in the book) will cover all one needs to know about descriptive statistics. I know, I know. Big Data is really about correlation and finding the correct distribution. Mostly, for commercial uses, it's about finding a few golden correlation needles in a haystack of choices by millions of people. So you can spit more enticing ads at a few of them. I wonder how many of these Big Data projects were ever subjected, a priori that is, to a rigorous cost/benefit analysis? While Watson is a multi-million dollar machine, most Big Data can be handled using R or PL/R on a pumped up Dell. So, in such cases, one need only find a few silver needles.

The apostates are beginning to crawl out:

Likewise, IBM will have to sharpen its focus and what it delivers, said Henry D. Morris, an analyst at the consulting company IDC. "Big Data by itself isn't value, it has to deliver recommendations about what to do," he said. "They have to show people not just analysis, but action. They understand that there are challenges ahead."

By the way, I'd love to get invited to the Watson party (fat chance, of course): the staff will be located in the East Village. If you have to ask where the East Village is, you're so uncool.

Dr. Codd Was Right

Colombia, Venezuela, Greenland, Canada then the rest!!

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive

09 January 2014

Big Dig, Big Data, Big Deal?

No comments: