18 March 2009

The Next Revolution, Part One

It all started with a transcript of an interview with Linus Torvalds, in which he remarked that solid state disks would change the way file systems worked once they were relieved of delays inherent in rotating storage. Well, if file systems change, why not databases? Which got me to musing. I do that from time to time.

A bit later, I ran across a thread on comp.databases.theory, and posited this notion. CELKO chimed in that his at that time to-be-published next book ("Thinking in Sets", and worth having) had a section discussing solid state disks.

So, two Well Known People and solid state disks. There just might be something there, there.

Here's the notion. Databases started out with what became the Network model about 1960 (yes, that long ago). IBM wasn't thrilled (they hadn't worked out the Network model), so they invented the Hierarchical model (as IMS). Dr. Codd was at IBM and wasn't thrilled with IMS, so wrote up the relational model in a 1969 internal paper, which was a public paper in 1970. IBM was not happy. Here was one of their own taking potshots, and getting bullseyes, at what was already a cash cow.

What became Oracle Corp saw the opportunity, and they took it. The first "commercial" relational database was released by them, 1979. From there, the players we all know and love began their struggle over what *is* a Relational Database Management System, what *is* the proper language to be supported internally, and even whether a RDBMS should be about relational data.

The most persistent struggle over these 30 years (leaving aside for the moment all that is xml) has been over how relational a production database should be. Since IBM's core (from the point of view of database technology) was then, and is now, COBOL programs running on mainframes; their users remain wedded to the notion that performance of databases is always detrimental to performance of the whole application. The reason for this sideways thinking derives from the way COBOL code practices have remained solidly in the data "models" of the 1970's, since, to all intents and purposes, so has the language.

Rather than refactor out of the code, which likely is still architecturally identically to whatever was the initial design 20 or 30 years ago, the data integrity bits and put them in the database with the data; all that is typically done is to move the data from the existing datastore (IMS or VSAM) into DB2/Oracle/whatever. The result is that the database is asked only to retrieve data in disconnected lumps. Send the Order header. Process that. Send the Order lines. Process them. Send the Inventory lines. Process them. And so on.

Such "legacy" coded applications can be morphed into using the RDBMS to its fullest capabilities, but that is too often, based on back of the bar napkin jottings, dismissed as too expensive. Adding another sand brick to the sand castle is seen as better. From a quarterly bonus for the boss point of view, that may be. But the tide is coming in, and it is called solid state disks.

What is coming: a faster, cheaper, easily maintained application based on the multi-processor solid state disk machine. Building and running these applications will be so economical that even the back of the bar napkin jottings will succumb.

No comments: