Dr. Codd Was Right: May 2010

25 May 2010

Violin's Flash Sonata

As regular readers are aware, since the inception of this endeavor, I've held to the position that the maximum benefit to databased applications comes from using SSD storage as the sole storage medium (not necessarily for backup). And you will have noted that I've been less than thrilled over the last half year or so to see both storage vendors (the EMC's of this world) and SSD "enterprise" vendors (the STEC's) segue into "tier-0" storage mode.

Even if it worked, tier-0 doesn't make much sense to me. If you want to cache HDD, just use gobs of DRAM (or SRAM if you want real speed and persistence). With DRAM, you'll get consistent IOPS on read and write without all the hassle. And so on. What the tier-0 folks haven't addressed (that I've seen, anyway) is the question: what is the point in having two persistent datastores on-line? Just use DRAM cache.

Now, Violin has thrown down another gauntlet ( here is Violin's release). We don't need no stinkin' tier-0.

The money quote from the Register article:
"The challenge to HDD array vendors who currently see flash as just a tier-0 data container and not a container for all primary data is getting stronger and stronger on an almost daily basis."

This is a effort behind which I can get. Violin doesn't refer to the device as an SSD, or array of same. It's just a persistent datastore implemented in flash. I won't be getting one to test BCNF databases; a tad outside my price range. But the point of view is refreshing. The various analysts quoted both in The Register and the Violin PR miss the database structure implication; I'll be emailing them. I'll let you know what, if anything, I get back.

20 May 2010

It's Alive!!!!

Today's news brings the assertion that Big Business now embraces Creativity.

Here are the four bullet points (nicely bolded in the original):

Needed: Creative Disruption
Disrupt the Status Quo
Disrupt Existing Business Models
Disrupt Organizational Paralysis

I admit that these kinds of disruptions are my stock in trade; that was true before the advent of flash SSD, I was a proponent of SSD when they were only (realistically) DRAM parts. But, as I've said all along, the Real Point of SSD in relational database development isn't the SSD per se, but rather the design freedom it bestows. The significant cost and efficiency gains, all the Good Things from soup to nuts, accrue because the SSD supports BCNF databases as alternatives to the (un-)de-normalized legacy Frankensteins.

Taking each bullet in turn.

BCNF databases represent Creative Disruption since they implement a set based paradigm which is anathema to COBOL/java Row By Agonizing Row coders. Nothing is more disruptive to a coder than to find that only 1/10 as much code is needed to accomplish a goal. Can you say: You're Redundant?

BCNF databases Disrupt the Status Quo since they actually are relational datastores as opposed to the flat file messes that continue to be dumped into database engines. Can you say: toss out the status quo and do it right?

BCNF databases Disrupt Existing Business Models since they represent a smaller footprint (by as much as an order of magnitude) datastore. Both vendors of hardware and software (application vendors, primarily) will find their sphincters tightening. Rather than thousands of disk drives, it's tens or hundreds. Rather than millions of lines of code, it's thousands. Rather than lots of client side code, it's just a sprinkle.

BCNF databases Disrupt Organizational Paralysis since they can be designed and implemented much faster with fewer bugs, although they do demand more thought and time in design. But that's to be expected. Years ago I spent some time with W. Edwards Deming, he of statistical quality control (I was doing stat and OR at the time), who hammered his particular nail: think first and do it right the first time. BCNF databases allow those who understand them, and those managers who are willing to take advantage of the situation, to get the train moving.

But we'll see. There have been stories about such a transition every now and again going back to the late 1960's. Smart work would be the future. Instead, we've had scheming. We'll see.

18 May 2010

May the Force be with Us???

SandForce recently announced a sloganeering campaign. They're attempting to be the Enterprise SSD Supplier of record. STEC has, sort of, been the de facto Enterprise SSD supplier. Is it a good thing, from the perspective of this endeavor, for SandForce to be successful?

I think not.

My reasoning is ground in the nature of the SandForce controller implementation. In order to reduce write amplification, the SF controllers muck with the raw data, compressing (at least) the incoming byte stream. AnandTech has tested the SF-1200 controller (the SF-1500 Enterprise controller is not material different in approach) in various SSDs, and found that, on randomized data, performance drops markedly. This shouldn't be too surprising; the lack of patterns in the data stream means that the secret sauce in the controller's algorithms can't be slathered on. Since industrial strength databases routinely compress indexes, and some offer the (security driven) option to compress data, I expect that SF based drives, should SandForce succeed in its plan of hegemony, will not serve the database world well.

The type of drive most amenable to RDBMS is the "fat" drive described here. The STEC drives, as it happens, are regular. Fat drives have the advantage of updating the on-drive dram cache before writing to flash. Now for the "sets are the way to think" plug. Updates which are, in application code, Row By Agonizing Row inevitably run afoul of SSD write requirements. Unless the controller (or the database engine) is smart enough to *fragment* table data, RBAR code will make looping write requests to, one row at a time, a block. This is not a Good Thing. Fat drives, with suitably contiguous table data, will fair much better in set thinking applications.

So may the Force of BCNF data be with us. Just not in SandForce drives. Not, at least, without demonstrating that the SF secret sauce is compatible with the engine in use.

13 May 2010

A Better Mousetrap

Here (scroll to 6 May) is an assessment of STEC in the SSD space. Not surprisingly, to me at least, is that Zsolt reaches the same conclusion I have: SSD vendors have to make the case for the device. Just saying that "it's faster" will never cut it unless (very low probability) and until NAND, or its successor, reaches physical density commensurate with rotating rust.

As this endeavor postulates, BCNF relational databases for transactional systems on pure SSD storage is the sweet spot for vendors. The "Tier-0" approach au courant with storage vendors only consumes a pittance of drives. I'm getting closer to volunteering to head STEC's marketing department.

08 May 2010

iPad, There for I Yam What I Yam

The Apple vs. Flash situation has led to many threads on many sites asserting many points of view as What It All Means. I follow Seeking Alpha, came across this one. Which led me to concoct a reply, which is below, although I'd recommend reading the whole thread, as there are some insights in the comments.

[A commenter] got close to the root issue: the iPad (generically) means a complete semantic shift for applications, which shift came (but was largely ignored) with the first GUI. Those GUI's re-implemented the Menu Interface of VT-100/*nix/database applications (think, Progress, Unify, etc.). It was just pixels rather than characters. Most GUI users who use applications deeply switch to Hot Key navigation anyway, skipping the mouse.

What a true GUI demands is that all input is Pickable. This means that No Keyboard is the rule. This, in turn, requires a complete re-thinking of data semantics. The application data must be sliced and diced into bite sized pieces which can be iconized and presented to the user. The user must not be required to *produce input*, only choose input. This is a major shift. Given that GUI developers didn't do much to the semantics of applications during The Era of the Mouse, they've got a lot of territory to cross. The alternative is, as some have suggested, that legacy applications (and it's worth remembering that lots of Fortune X00 applications remain COBOL/VSAM mainframe ports, some with data transferred to RDBMS) will continue in the Corporation, just because "it still works". Whether anyone in the CTO/CIO offices will have the gonads to start over is a question, almost certainly, NO.

The relevance to this endeavour is as I have said many times before: the database controls the data, while the "terminal" just paints a pretty picture and takes input. An iPad-centric application must have easily pickable data, not dependent on keyboard (which is the way mouse-centric GUI's should have been defined in the first place) typing. This sounds, to me at least, to be a perfect fit for a BCNF database. Nicely sliced and diced into bite size morsels. I find it supremely ironic that a GUI device should be the final impetus for intelligent data design.

04 May 2010

IBM Builds a Sand Castle

IBM is wriggling its little toesies in the Sand(Force). Here's an article and another, fresh off the presses, about IBM using SandForce controllers. Neither say which, if any, vendor supplied the SSD's. It could be that IBM built them in-house from parts. This one indicates that's just what they did.

What I like about the stories is that IBM made the announcement TPC-C benched. Not that a TPC-C database is necessarily fully normalized, but it brings the point home: SSD is about relational databases. Very cool. And about bloody time.

Dr. Codd Was Right

Lisa Murkowski, Swamp Critter

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive