22 July 2010

Bill & Ted Return Home to Find BI Corpses

The purpose of this simple test of SSD versus HDD for the basic structure of the BCNF datastore, the inner join, was to show that joins are, effectively, costless. Some object to the word "costless", asserting that joins always have some cost. But all queries have cost, of course. The root question is whether joins cost more or less relative to the flatfile image alternative. For real applications, I believe the answer to be costs less.

There is also the byte bloat aspect. My vision of the not too distant future looks like the BI vendors will be in bad shape. The entire point of BI schemas, and code to process same, was that joins were "too expensive" for operational datastores. The reaction was to extract operational data, explode it, then scan it against odd indexes. Odd from the point of view of operational data, that is. But now that SSDs can handle joins with aplomb, is there much point to all of that effort? You need additional servers, software to do the ETL exercise (even if you're smart and just use SQL), the BI software to create all those not-really-SQL-schemas-but-just-as-arcane (if not more so, see Business Objects) data definitions, and separate "experts" in the specific BI product you've bought into.

What if someone showed you how you can do all that BI reporting from a normalized, compact, schema; just like the normalized, compact, schema you've now got for your OLTP operational application on SSDs? Why spend all that effort to, quite literally, duplicate it all over again? Aside: there is software marketed by storage vendors, and some separately, which does nothing but remove redundant data before writing to storage; I find that one of the funniest things in life. Rube Goldberg would be so proud.

Yesterday I read in an article about the origin, disputed, of the term "disruptive technology". The point of that part of the article was that what gets label "disruptive" mostly isn't, just mostly marketing hype. Well, SSD/multi machines aren't just hype. The retail SSD was invented to simplify laptops, mostly reduce power draw and heat. Thus we have the 2.5" form factor persisting where it makes little sense, the server and power user desktop. Once the laggards in the CTO/CIO corner offices finally get beaten up enough by their worker bees (you do know that leaders, of techies, mostly follow, yes?) to use SSDs to change structure, not merely as faster spinning rust, the change will be rapid. Why? Because the cost advantages of going to BCNF in RDBMS is just so massive, as TCO calculation, that first adopters get most of the gravy.

Remember the 1960's? Well, remember what you read ABOUT the 1960's? There was a gradual shift of wifey from the kitchen to, initially, professional occupation. For those families that had a working wifey early on, they made out like bandits, since the economy overall was structured to a single income household. As time went on, the economy overall shifted (through inflation caused by all those working wifeys) to requiring a two income household to maintain the previous level.

And so it will go with SSD. There is a significant difference, one hopes, between this transition to that from tape to disc. The early disc subsystems were explicitly called Random Access Storage, but COBOL was the lingua franca of the day, and had already accumulated millions, if not billions, of lines of sequential (tape paradigm) processing code and an established development paradigm. So disc ended up being just a more convenient, and a bit faster, tape. Today is different. Today we have the SQL (almost Relational) database which can exploit SSD in a way that the byte bloat flatfile paradigm can't. It's a good time to be young and disruptive. Cool.

No comments: