Dr. Codd Was Right: January 2011

31 January 2011

(Not So Simple) Simon Met a Pieman

Oh my. Another emperor's new clothes apostate. Another O'Reilly cite; they're coming in waves lately. Who knew? Anyway, they've just published an interview with Simon Riggs, who heads up 2ndQuadrant, a PostgreSQL consultancy. I run hot and cold on PG, mostly because I still believe in the locker paradigm, on balance, and that means DB2.

Here's the quote:

How come CouchDB, FluidDB, MongoDB etc. are so much in the news at the moment?

New things are sexy. Nobody wants to hear that the way your Dad used to manage data is still the right way, in most cases. Especially when it turns out that what you just suggested isn't new at all, and that your Granddad used to manage data that way too before he gave it up.

Ouch!! I swear I haven't done a Vulcan mind meld with the guy; never met him. But he's got that sharp tongue I do appreciate.

Game, Set, Match (not the other thread)

Big Data is among the juvenile memes (both in terms of the meme's age, and the maturity of its proponents) out there. I've been reading Janert's book off and on for the last couple of months, motivated by a scan in my local B&N. That he has a, shall we say guarded, view of what good Big Data is, led me on. My interest in math stats goes back to before my dive into RDBMS, and, given the absolute choice to do it all over again, I'd have stayed with stats; if for no other reason than the fact that experience is far more respected there than in databases (hissssss).

Imagine my surprise then to find an article which promotes set-oriented semantics at the O'Reilly site. There was a time when I worshiped O'Reilly books. Not so much since he/they went on the meme pimping road; web 2.0 being the most obvious. Perhaps because web 2.0 has faded, they/he have invented another, data science. And, as usual, it's code/language oriented for the most part; the irony of this approach appears utterly lost on the group assembled. Except for this article. There are two, the one on the site page references the one in the link here, which goes the full monty. The site page article is a tad more than an executive summary.

I'm not going to sprinkle quotes, since I'd end up reprinting pretty much the whole thing. Suffice to say, just go read it.

24 January 2011

Passion Fruit

As I've mentioned a number of times, while I've spent the last decade mostly in DB2/LUW, I've done some work with SQL Server, and hang out at the Simple-Talk site. They have an editorial every now and again, which generates comments on RDBMS generally (usually) rather than SQL Server specific bells and whistles. The current one is "Passion" (yeah, that's right), and I was moved to comment. Shown here for your amusement.

I've been hanging around here for a few years, and I sure do view databases (even SQL Server) as a "calling" which is sorta, kinda like a "passion". All of the places I've managed to find employment have been in spite of this calling, alas. Perhaps it's different on the other side of the Pond, or with SQL Server centric venues (my main database has been DB2/LUW for some while), but "passion" isn't what's rewarded.

Certainly, the advent of multi-core/processor SSD machines will have a profound impact on how RDBMS applications are built. The transition just isn't happening fast enough to suit me.

The corporate bent of DB2 does seem to make it more hidebound than other databases; which is such a shame, in that the LUW version is so much better than any of the other databases that run on those platforms. IBM is truly squandering a major opportunity.

14 January 2011

Trains, and Planes, and Automobiles

Well, Bob and Oracle and IBM. Cringely is at it again, and he provoked me into a reply to his drivel. It was interesting enough, to me at least, to warrant repetition here. You're welcome. (I've cleaned up a couple of typos here; rather early for my eyes to focus.)

@Bob:
IBM's competitive advantage over HP and Sun was IBM had a services business.

Well, no, not exactly. As I have predicted since Oracle bought Sun (and Bob decided not to include in his 2011 list), the advantage IBM has are all those z/Series (or whatever it's called today) Fortune X00 clients. Lose the z/Series box, and services disappear.

Larry has lusted for those clients for a long time. There's a reason Relational Software (not even its first name) had the first "commercial" RDBMS, stealing a march on Armonk. Armonk hated Codd, since he stuck a fork in IMS shortly after its birth. Larry wanted an alternative to the mainframe app world. He had only software, but did OK.

But the only way to get all those big, juicy Fortune X00 z/Series clients away from IBM is to have a full stack. He has to find a way to convert 20, and 30, and 40 year old COBOL code to something modern. With the Sun hardware and Oracle he now has a shot at it. It won't happen tomorrow; I'm wagering a prediction that there'll be concrete progress this year.

As to overplaying his hand with the *nix client base? It wouldn't be the first time he's done that, too. I don't see one strategy precluding the other tactic.

07 January 2011

Cringely Hates Me, He Really Hates Me

Bob didn't include either of my predictions in his list. I feel so violated. But, if you look at the 10 in toto (here they be, if you don't tarry), the pattern I've been predicting emerges. A world of centralized data, and pretty pixelated clients. And, may be, not browser driven.

Could it be Bob that doesn't really hate me? I suspect his mission was to wrap it all up in Apple; he's been fixated on Steve for some time now.

06 January 2011

Pundit For a Second Day

Regular readers know that I submitted to the Real Cringely a prediction. He's only up to 6 so far, thus, I don't know whether he's deigned to accept it. When he made the announcement, he only said that any outside predictions would be credited, not that the author(s) would get advance notification of "winning".

While we wait, and given the recent spate of SSD and processor news, a new idea has been worming around my cerebrum. This post is just to establish originality, not the full blown patent. I could well abandon the idea. It's just one of those thought experiments.

My submitted prediction was that Oracle would leap frog the other RDBMS vendors by pushing the BCNF approach in order to win over the IBM/COBOL mainframe application crowd to the Oracle/SSD machines. The magic bullet approach.

But what if a vendor, may be Oracle may be not, took the vision a step further? What direction would such a step take? How about writing the *engine itself* to multi-core/processor + SSD machines? How would such an engine differ from today's versions? Well, at minimum it would have an optimizer and execution unit which are parallelized. Some (most?) engines are parallel only to the extent of being able to execute multiple queries at once, one per thread (generally). What is not common is parallel execution of a query into the datastore.

With rising processor/core/thread counts, blasting a query through many threads begins to make sense, if the datastore can respond fast enough. The existing model of concurrency, whether the locker model or MVCC, seeks to minimize the time that a given row is locked. Row level locking developed to make this possible, and MVCC developed to make it "irrelevant". Both approaches are based on the concept of a conflict serializable schedule (Weikum & Vossen, pg. 92 et seq). In practice, the engine does what the COBOL/VSAM coder used to do: iterate over a bunch of "records" doing stuff one at a time. The RDBMS presents to the client an "all at once" facade, as Dr. Codd demanded, but there's just a really fast squirrel spinning the cage.

But matrix operations are what Dr. Codd really meant; one can view the relational algebra as essentially linear algebra for a limited domain. What if a RDMBS vendor looked at current, and near future, machines rather than past machines? Current databases are still based on uniprocessor slow disk technology. The emerging machines have been obvious for five years, anyway. How far along could, Oracle for example, be toward an engine that *demands* X number of cores and Y amount of SSD primary storage? Could you build a database engine from scratch in five years? Abso-freaking-lutely. OS/360 was written in less time, and in assembler. So, yes, this could be the year. My guess: Microsoft.

Losing My Religion

I don't know what OCZ is supposed to be an acronym for (don't end sentences preposition with), but 'Oly Crap Zoroaster will do. They're at CES, and here's some info from our friends at AnandTech. Note that the thing screams even with compressed data. Hmmm.

Later in the article is the toss-away:
"On the other end of the spectrum, OCZ presented an even bigger (physically) drive: the IBIS XL. Now this isn't going to be productized, but it's simply something to test the waters with. The IBIS XL fits into a standard 5.25" drive by and starts at 4TB."

As I've mentioned a few times, Zsolt at storagesearch makes the case that sooner, not later, SSD will even take over the petabyte world of storage. I've always thought that a bit daft. Now, I'm not so sure it's daft.

Gimme a B! Gimme a C! Gimme an N! Gimme an F! What's it spell??? The future. (This is where I toss the cute blonde cheerleader over my shoulder.)

05 January 2011

Is That Oz Up Ahead?

Is that an Emerald City over the horizon? Are we almost to Oz? Yes, yes we are.

The last week has brought a host of news, which taken together, indicate that the near future is nearly here. If I keep stepping half-way to the wall, do I ever get to the wall? Yes, yes you do, to within any delta you wish to name.

First, there was the Windows on ARM announcement.

Next, we have two reports from CES, courtesy AnandTech: the next Tegra, and its successor.

The future is clearer: it will be a pixelated VT-220 connected to a wireless network and thence to a relational database stored in BCNF on SSD. Such applications (not ported, in the common way, from COBOL) will run rings around all that legacy file based stuff. With sufficient bandwidth, and a persistent connection (your phone is, right?), it's back to the future. No, I don't believe that the phone/web (or web/phone if you prefer) paradigm is the winner. The flexibility and accuracy of the persistent connection paradigm will win.

For those who think that the web is really great progress over what came before, you need to know the simple history. I'll start with the consolidated mainframe world of the late-60's. There were mainframes and terminals (3270 is the archetype), over a disconnected connection. There were local edit programs written into the 3270. The transfer was block mode, meaning that all keyboard activity was between the user's fingers and the local edit program. Hitting Send (what we now call Carriage Return/Enter) sent off the edited screen to the mainframe app code.

I've just described a browser/html application.

Then, along came Unix and the VT-100 (there were later, more capable VT's, the VT-220 in particular). While connected by a wire, just as a 3270, this wire is always on. Ah. As a result, database engines and application code residing on the server see each keystroke on the VT-220. In fact, it was common to have 4GL's resident with the RDBMS in what was often referred to as "client/server in a box": the database engine being the server in its patch of memory, and a patch of memory being the application code for each client connection. Blindingly efficient, and allowed the database and the application code to edit/check *character by character* as the user progressed; if that sounds a bit like AJAX, well, yes it is. Not quite as cheap in memory, but this was the early 90's and later, when memory began to get cheap. Supporting 1,000's of terminals (or PC's running a terminal emulator) was not uncommon.

Then, along came the www, and young-uns thought this was new and fabulous. Negatory.

With the increasing density of chips we see an artefact of Moore's Law not often (except por moi) remarked: look at block a diagram of recent multicore chips. Most of the real estate is dedicated to various caches, oddly it seems, rather than using all those transistors to execute native instructions in hardware, the trend has been to emulating, for example, X86 instructions in a "hidden" RISC machine. Not that this is new; the 360/30 was widely believed to use a PDP-11 to run the instruction set. IBM did acknowledge that most of the 360 series emulated the instruction set; only the top end machines executed in hardware. The upshot is simple: few, in any, normal client machines have need for the compute power on tap.

So, we see the rise of ARM and MIPS (you did buy ARMH when I told you to, yes?) running minimalist machines at low power and low cost. With a persistent connection to a persistent datastore, let's dance like its 1995. And we will. You can't fool Mother Nature, and She's saying: keep the data and its management in one secure place, and let the kiddies make pretty, pretty pictures on their phones and pads. Just don't let them mess with the data.

Just stay away from the poppies. You heard me. Don't go into the field.

Dr. Codd Was Right

Make America White Again - The Gang of Six, 29 April 2026

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive