30 June 2011

I'm Just Going Through a Bad Phase

"Phasers on stun", said Captain Kirk. Well he said that just about every episode. Today ComputerWorld reported on IBM's PCM flash replacement. For background on PCM, see WikiPedia, and the article has a link to a much earlier one on PCM.

From the WikiPedia piece:
"PRAM devices also degrade with use, for different reasons than Flash, but degrade much more slowly. A PRAM device may endure around 100 million write cycles."

In the past, I've written about Unity Semiconductor, which has had a flash replacement in development for some years; I found them when I first began looking into SSDs. One way or another, we'll soon have a solid state datastore that is effectively infinite in write capability, just like HDD.

Once again, Oz looks larger and brighter. Be still my heart.

28 June 2011

A Bump in the Yellow Brick Road

No, I'm not renouncing. Events of the last year or so caused me to ruminate on this journey down the Yellow Brick Road. Some of the events:

Consumer/prosumer SSDs persist in not being built with data caps. The industry is, perhaps, more divided now than at any earlier time. Consumer devices use barely tractable MLC flash (~3,000 cycles), and SandForce continues to gain traction in the consumer side. Given the finite nature of flash, an SSD will die in the near future. An HDD, on the other hand, might well continue to function for the better part of a decade. In any event, the HDD doesn't have defined drop dead time.

Capacity remains under a TByte for the vast majority of parts. This is important because:
most folk continue to view SSD as just a faster HDD; which isn't important outside of the RDBMS arena, but critical to getting the most bang for the buck there. For RDBMS installs, where (re)normalizing is ignored, the cost of moving from HDD to SSD is expensive, so is often attempted with consumer level drives. In the HDD world, that's not unusual; most drives are both over there.

Small scale (web and SMB verticals, for instance) databases, often on MySql or Postgres, just won't be safe enough on consumer drives. The various threads on postgresql-performance make the case, much as I'd wish the truth be otherwise. What's particularly odd is that both vendors and most consumers appear to be OK with catastrophic loss of data in normal life. Very odd.

Given the physics of writing, SSD vs. HDD that is, is just way cool different. SSD controllers spew the bits all over the flash, and the erase process can hardly be considered atomic. The majority of SSD controllers use RAM caching to reduce write amplification, and this is an additional fault point. HDD based engines, industrial strength ones like DB2, can guarantee that only the open transaction(s) will be hosed on a failure. SSD based storage just can't if there isn't persistent power available.

The failure of developers, at least those who publish, to lobby for (re)normalization as part and parcel of transition from HDD to SSD is regrettable.

Is there still a Yellow Brick Road leading to Oz? I still believe so, but Oz looks to be more a Potemkin village than a New World. Only shops with the fortitude to make a full transition using enterprise quality SSDs will actually get there. One can eliminate 99.44% of web sites and SMB verticals; they're just content to be penny wise and pound foolish. Oh well.

14 June 2011

Sprechen sie Deutsch? Habla Espanol?

Artima has been quiet of late, not many new articles or comments. Could be that coders are satiated? Then I went over today, and Bruce Eckel has praise for Scala. Last I read, he and Bruce Tate (not related, so far as I know) had gotten Python fever. Well, Eckel has been infected for some time. Which is why I never expected to get a Scala piece from him.

Smitten, I ordered "Programming in Scala"; finally bit the bullet. It's one of the few languages, not on a database engine, that still intrigues me. Yes, I've done Prolog, Erlang, and Haskell, to name some. And, yes, Prolog is the closest to a RM language out there. But it's just not used enough. That, and the syntax is just plain wacky (not as extreme as Lisp, but that's to be expected). 2.8 is the current version, and is covered in this second edition.

But, back to Eckel's article. What struck me, yet again, is the emphasis on iteration in his discussion. Yet another language creating yet another syntax to loop. Why are we still doing this in high level languages, not-assemblers? For some code, that which doesn't deal with real data in databases, it could be argued that application code needs to loop. But, even then, I don't quite buy it. I spent/wasted some years with Progress/4GL, a database engine + application language. It had flippant support of SQL in the engine, but 99.44% of coders used the bundled 4GL. And how did this language deal with table data? You guessed it, --For Each-- . Now, this was promoted as a *4GL*, not COBOL. Fact was, it was effectively COBOL. We've been saying to each other for at least three decades that the future is now, and the future is declarative coding, yet we keep focusing on application level iteration. The datastore should do that. It's written in the lowest level language, typically naked C (with, I suspect, performance bottlenecks in each support OS assembler).

Let the datastore be the datastore!! (Yes, that does remind you of some political hackery from the 1980's.)

Declarative development is exemplified by the RM and RDBMS. Why the refusal?

We continue to see web/client app code which, even on database support/discussion sites, we're concerned about sending whole result sets to the client. Why? If you're intent is to make changes to multiple rows in a specific fashion, that's a stored procedure. Do it all on the engine. That's what it's good at. Don't ship data off the server, just so you can iterate (using the very special syntax of your fave language) over thousands or millions of rows. This is folly. Such discussions always then devolve into arguments about transactions locking rows; and so forth. Yikes!

While it isn't quite what one might expect, given its title, everybody should read Celko's "Thinking in Sets". It's not a treatise on set theory in the datastore, but still useful in providing examples where table data makes more sense (even in a performance metric) over code.

09 June 2011

My Security Blanket

Regular readers may note that Linus Torvalds has pride of place in the quotation section of this endeavor. He's recently been interviewed by some e-zine I've not heard of.

Here's a quote:
I'm also a huge fan of SSDs, and the huge reduction in latency of storage technologies has some big impacts on OS performance. A lot of people end up spending a lot of time waiting for that traditional rotational media.

Pretty much what he said back in 2007. Let's get a move on.

07 June 2011

Fiddlers Three

In the event that you haven't been following the news, Fusion-io is in the midst of an IPO. It turns out, that Violin Memory is closing in on doing so, too. Since I began digging into the SSD story some years ago, Violin has been kind of (but not quite) stealthy. In part, because it is still private, so doesn't end up in various "investor" discussion forums, and in part because it focuses on the True Enterprise SSD; Fusion-io and OCZ in particular are public and consumer focused.

The article mentions Oracle, which fits with my vision of where SSD databases are going. Eventually even Larry will figure out that the bang for the buck (his, not necessarily his customers) lies in Being Normal. For Larry, that's saying a bit.