29 September 2009

Alice, This isn't Kansas

"Alice in Wonderland" gives us a wonderful quote, from Humpty Dumpty: When I use a word it means just what I choose it to mean - neither more nor less.

What that has to do with today's musing is thus, I still would rather do databases for money than trade biotech stocks. So, I keep a baleful eye on various listings, looking for anything interesting. I generally only bother with titles as; Database Architect, Database Designer, and DBA (this is the catchall title which sometimes is more than doing backups).

Here's a position with the title, Database Architect for Nokia (Boston area office). I'll leave the curious to venture there, at their peril. Now, I would expect (devilish imp that I am) that a company the size and sophistication of Nokia would have a clue about data and databases. I eagerly surfed to the listing, only to find (among other silliness) the following:

Domain expert in MySQL design and development

Familiarity with Hadoop/Pig/HBase and MapReduce/Sawzall/Bigtable desirable


I'll send along a resume just for shits and grins, but for pity's sake. This isn't a Database Architect. It's an application coder. Gawd darn it.

26 September 2009

We've Seen This Movie Before

Many of us have seen this movie before, if you're of a certain age (or had instructors who are). It's a cross between "The Return of Fankenstein" and "Groundhog Day". The theme has arisen with some frequency in the last few weeks, on Artima in particular.

The movie is scripted: it's all well and fine for you to talk about SSD changing the rules, but we've still got to write our applications for the normal HDD environment; not all clients will have gotten the SSD revelation. I knew this sounded familiar, but it took me a while to put my fingers on it.

In the 1960's, and more so the 1970's, the magnetic disk subsystem (IBM terminology then) began to proliferate. But it didn't instantly replace the 9 track tape drive. In the IBM world, COBOL was the language (along with some Assembler and that mutant PL/1) of choice, and had been in use for a decade by the time the 370 took over. The I/O calls had been written for sequential access (3 tape sort/merge update was the paradigm) from time immemorial.

The result was that COBOL continued to be written to a sequential access method, even though random access was the whole point of the disk drive. Files on disk were imaged as if they were on tape. The reason was simply convenience to COBOL maintenance coders. Even new applications tended to do the same; inertia is a powerful force.

Hardware paradigm shifts often are captive of software inertia. SSD is not the only one now. The multi-core/processor machine presents problems to coders; in greater magnitude than SSD. Here's Spolky's recent rumination. The money quote:

Sure, there's nothing officially wrong with trying to write multithreaded code in C++ on Windows using COM. But it's prone to disastrous bugs, the kind of bugs that only happen under very specific timing scenarios, because our brains are not, honestly, good enough to write this kind of code.


Whether multi-core/processor machines will ever be useful to application programs, by which I mean guys writing the 1,023,484th General Ledger for Retail Sales, remains up in the air. The guys writing operating systems and database engines will have much fewer issues; they've been in the multi-threaded world for decades. We should let them go ahead and do their thing. Let the database engine do all that heavy lifting, and leave us mere mortals to decide which widget to use and what the schema should look like.

On the other hand, making the transition to SSD store and BCNF schemas will require slapping down hidebound application coders who wish to remain in the far past. I see a future where applications which have limped along, structurally, unchanged since the 70's finally being replaced with small high normalized databases. It will be just too cheap not to. A system based on a few dozen SSD will replace those geriatric pigs with thousands or more HDD. The ongoing cost difference (TCO, as they say) will easily be greater than the amortization of build costs.

For those geriatric pigs which were built somewhat more recently, and built around stored procedures rather than application SQL, will have a better chance of survival. All these codebases will need is the schema refactored, and the stored procs updated. The client application code wouldn't change; the proc still returns the same bloated data, alas.

If you go to the concession stand, I'd like a large popcorn and Dr. Pepper. Thanks.

21 September 2009

Inoesco: "Exit the King"

I will happily admit that the machinations going on around SSD is currently my main fascination; this is the enabling technology for true RDBMS designs. So, watching the money grubbers is certainly amusing.

Last week (and continuing, although to a lesser degree, today) STEC's share dropped like a rock. Today they released a White Paper defending their part. It's actually got some useful information.

But, you've got to wonder. The PR and the posting of it's existence on the Yahoo (and probably all others; it's also on Schwab) message board is geared to Joe SixPack plunger, not the enterprise CIO. Some times, one has to just wonder what goes on in some people's heads.

19 September 2009

Color Me Sad

Were I a real reporter, I might have been there for the announcement, but alas no. Turns out that FlashFire is not a SSD array. This is Larry (ComputerWorld is where I found this quote, for whatever reason not in their PR):

"We have a huge, fast flash cache built into our storage servers," Ellison said. "These are not flash disks -- make no mistake, these are not flash disks. This is a smart memory hierarchy made up of DRAM in our database servers and flash in our storage servers, with very sophisticated algorithms. This is a very smart memory hierarchy where the Oracle software manages that memory extremely efficiently, much faster than flash disk."

So, it seems that FlashFire is not a flash disk subsystem. Oh well.


Update:

I knew I had read that FlashFire was specific to SSD implementation; I know Professor, I should keep better notes. But Burleson is one of the veterans of SSD for databases; Oracle in his case. I'll be adding a few Good Stuff links, and his main site will be one of them. So, here's his take on FlashFire (which also conflicts with what Larry said).

18 September 2009

I Can do a Full Gainer

I'm going to bite the bullet, so to speak, and get an SSD for this machine and do some personal experimenting. I've been putting it off for a while due to: the time devoted to getting work (which is way too time consuming), the time needed to be a stock tycoon, and my less than stellar view of what's been available. What I want is a device that's under a grand, has enough capacity to simulate a real-world (commercial) system, and is a no brainer to install on ubuntu. I've long since lost interest in doing hardware installs just for laughs.

I think I've found what I want. It's the Fusion-io ioXtreme. It was reported to be shipping in July, but looking at the Fusion-io site, they're in sign up to be the first on your block to own one mode. Sigh. But the price, $895 for 80gig in a PCIe card, is in the proper ballpark, and it won't be rattling around in the machine. More than the X-25M, about the same as the X-25E, $$/gig anyway.

While I was wandering around the Fusion-io site, I came across this from July. Now, it doesn't read as though they went ahead and normalized the data, just moved it to the SSD. (It's a TPC-H database and likely some form of star/snowflake.) I can live with that; folks are still taking baby steps along the Yellow Brick Road. It does demonstrate, if one accepts the notion of validity of TPC benchmarks, that SSD can save money while being faster and less filling.

17 September 2009

Blood in the Street

There be carnage out there. I've been keeping a periodic eye gazing at the SSD stocks (and the increasing number of privates), with STEC being the "acknowledged leader" in enterprise drives. So their PR always says. It is true that STEC was, if not the first, certainly early and often qualified. Their list includes EMC, IBM, Sun, Compellent, HP.

It's been a week since I looked at the stock (I spend much of my stock time tracking biotech; more money more faster), and to my wondering eyes do appear but a true crash. Last I looked the share was over $40. Today it closed at $31.53. Trust me, stock promotion is not a factor in this endeavor, but it is undeniable that the current state of SSD in the enterprise is because of STEC's efforts to make itself rich, which it has.

The management of the company has been singing the "ain't nobody can do what we do" song for the last couple years, and in the last 12 months has signed up with the aforementioned companies. All the while deflecting questions about the likelihood of other suppliers of SSD. I never bought it. This site has been tracking the SSD world for more than a decade; since before flash was even used. Spending some time there makes it clear that STEC isn't the only game in town, and never was.

So, what happened? Turns out that Pliant Technology released its version of enterprise SSD a few days ago, which prompted some of the analysts to reduce their opinions of STEC.

The reason all this matters is that having multiple credible sources of enterprise SSD (what that term means is still open to discussion) is better for real relational database implementations. Which is what this endeavor is really all about. The SSD aspect is merely the implementation detail that makes it all possible.

What's bad for the Wall Street casino players is actually good for folks who are working at building useful things, and not merely engaging in zero sum games with each other.

11 September 2009

Larry Finally Speaks, and.... I'm Right

For all of you out there who've been saying that Larry wants Sun for java or Solaris or MySql, here's what he said yesterday in the Wall Street Journal:

We're in it to win it.
IBM, we're looking forward to competing
with you in the hardware business.
Larry Ellison

How dare you all doubting me. I've been doing this for a long time. He wants to kill Armonk's mainframe business. He always has. (See 28 August for the most recent discussion.)


Update (15 September):

OK, so today Oracle/Sun announced the new Oracle Database Machine. Oracle had previously been building the Exadata machine on HP hardware. No longer. Of particular interest to the readers of this endeavor is this:

The Sun Oracle Database Machine also includes Sun's new FlashFire technology to cache 'hot' data for dramatically improved transaction response times and throughput.


So, what is FlashFire? According to Sun, it's their implementation of SSD flash cache and system software for same.

From the PR:

You get ten times faster I/O response time and use ten times fewer disks for business applications from Oracle as well as third-party providers.


Larry's not interested in hardware. Nope. Armonk, you've got a problem.

03 September 2009

Persistent Myth: Bandwidth is Infinite

There exists, still, the myth of infinite bandwidth. The myth exists in support of the notion that "web" applications can and should be just like desktop applications. But there is a problem: what is a desktop application? In the beginning, 1982, the IBM PC provided a standalone little computer, which was expected to be programed just like the 370, only for smaller problems related to the work of the individual.

That fairy tale came to an end with Lotus 1-2-3, which turned the PC into a toaster: an appliance which did some computing (itself done by programs written by professional assembly language programmers) upon some data entered, or made available, by the individual. Then came typing programs, later renamed word processing. The toaster syndrome was in full swing.

Then came Netware, and its ilk, to lead us to a kind of client/server environment. This is what "desktop application" really means these days: a local PC connected to a semi-local big computer. The VT-100 connected to a *nix database machine is the precursor to that.

AJAX, and so on, are attempts to take the 3270 behaviour of the web and turn it into the VT-100, albeit with pixels and graphics. In order to do that, the link to the outside world has to behave like fast RS-232.

So, today The New York Times runs this story. Infinite bandwidth, my eye. A bloody phone brings the net to its knees. When will people learn what your Mama told you, "what kind of world would we have if everybody behaved like you?". Nothing is infinite, stupdity likely excepted.


Update II:

There is the olde canard about tapes in a station wagon. Here is a new and even more amusing example.


Update:

In response to some questions from readers elsewhere, I'm led to pontificate further.

I missed out the obvious point (to me, anyway). There are two expenses in getting an image on the screen: computation and transfer of the image. With a 1982 desktop, what could be computed was memory mapped to the screen, so transfer was instantaneous (mostly).

With local networks and VT-100 to RS-232 to database, the screen is still a memory map in the server; all that goes over the wire is the characters in the screen image.

With GUI-ed screens in a local network, it's still manageable with Ethernet on wire.

With GUI-ed screens in the cell tower, not so much. Given that HTTP is about lots of request/response between the client (iPhone) and the server (Google machine, or whatever), the "virtual wire" gets overloaded. And will always be.

It's the same with building highways or subways or ...; traffic overwhelms infrastructure.

With an HTTP based internet, it's not possible to have a (mostly) passive (memory mapped) screen with all the computation at the server. Fact is, increasing computational power is a couple of orders of magnitude cheaper than I/O. And the web is about the least efficient form of I/O ever invented. It's not being used the way Cerf had designed it.

Abuse leads to breakdown, and the web is broke. The iPhone just makes it obvious, but not the reason.