07 June 2009

But, I can see Russia from Alaska

As Sarah said, you can see Russia from Alaska. With regard to SSD and relational databases, a view is going to motivate another major change to how we build database applications.

During this down time, I have been keeping track of the producer of SSD, STEC (both its name and stock symbol; it started out as Simple Technology), and they continue to announce agreements with large systems sellers, HP being the newest. A notion has begun to bubble in my brain. Said notion runs along these lines. The major sellers of hardware have gotten the SSD religion. Absolute capacity of SSD will not, in a reasonable timeframe, catch up with HDD. This puts the hardware folk in the following situation: they have these machines which can process data at unheard of speeds, but not the massive amounts of data that are the apple of the eye of flat file KiddieKoders. What to do, what to do?

The fact that STEC and Intel are making available SLC drives, with STEC targeting big server and near mainframe machines, and the machine makers taking them up on the deal means that there really is a paradigm shift in process. Aside: Fusion-io is targeting server quality drives on PCIe cards. That puzzled me for a bit, but the answer seems clear now. Such a drive is aimed squarely at Google, et al. The reason: the Google approach is one network, many machines. A disk farm is not their answer. Fusion-io can sell a ton of SSD to them. Alas, Fusion-io is private, so no money for the retail investor. Sigh.

The paradigm shift is motivated by the TCO equation, as much as blazing speed. SSD gulps a fraction of the power of HDD, and generates a fraction of the heat and thereby demands a fraction of the cooling. So, total power required per gigabyte is through the floor compared to HDD. There is further the reduced demand to do RAID-10, since the SSD takes care of data access speed in and of itself. So, one can replace dozens of HDD with a single SSD, which then sips just a bit of juice from the wall plug. The savings, for large data centres, could be so large that, rather than just wait to build "green fields" projects on SSD/multi machines as they arise in normal development, existing machines should be just scrapped and replaced. Ah bliss.

But now for the really significant paradigm shift. My suspicion is that Larry Ellison, not Armonk, has already figured this out. You read it here first, folks. There was a reason he wanted Sun; they have been doing business with STEC for a while.

The premise: SSD/multi machines excel at running high NF databases. The SSD speeds up flat file applications some, but that is not the Big Win. The Big Win is to excise terabytes from the data model. That's where the money is to be made. The SSD/multi machine with RDBMS is the answer. But to get there, fully, requires a paradigm shift in database software. The first database vendor to get there wins The Whole Market. You read that here first, too, folks.

There have been two historical knocks on 4/5NF data models. The first is that joins are too slow. SSD, by itself, slays that. With the SSD/multi machine, joins are trivial and fast. These models also allow the excision of terabytes of data. The second is that vendors have, as yet, not been able (or, more likely, willing) to implement Codd's 6th rule on view updating. Basically, only views defined as a projection of a single table are updatable in current products. This is material in the following way. In order to reap the maximum benefit of the SSD/multi machine database application, one needs the ability to both read and write the data in its logical structure, which includes the joins. In the near term, stored procedures suffice; send the joined row to the SP, which figures out the pieces and the logical order to update the base tables.

But what happens if a database engine can update joined views? Then it's all just SQL from the application code point of view. Less code, fewer bugs, faster execution. What's not to love? The pressure on database vendors to implement true view updating will increase as developers absorb the importance of the SSD/multi machines, and managers absorb the magnitude of cost saving available from using such machines to their maximum facility.

Oracle will be the first to get there just because IBM seems not to care about the relational aspects of its databases. The mainframe (z/OS) version is just a COBOL file handler, and the LUW version is drinking at the xml trough. Microsoft doesn't play with the big boys, despite protestations to the contrary. With the Sun acquisition, Oracle has the vertical to displace historical IBM mainframes. Oracle has never run well on IBM mainframe OSs. Oracle now has the opportunity to poach off z/OS applications at will. It will be interesting.


PaulM said...

Nice article. I am surprised noone has commented already.
It has always been a physical implementation issue i.e. people bag relational databases and the relational model is throw out as well.

Anonymous said...
This comment has been removed by a blog administrator.
Robert Young said...

Well, I'm not as famous as Date or CELKO, although I had a usenet chat with him about SSD and databases. But there is a comment a couple of posts back that is amusing.