Dr. Codd Was Right: August 2010

31 August 2010

What's It All Mean, Mr. Natural?

The issue is: what does Oracle think it can win? The answer appears to be a fat license fee from Google. The fact that some Google folk once worked at Sun is irrelevant. Dalvik was built independently of the jvm, and doesn't resemble it. It does not translate/compile java bytecode/classfiles on the fly. The development is done in java (SE, I believe, via Harmony/Apache; if they use the ME, then there's trouble). Once the .class file exists, it is translated to Dalvik .dex format. This is no different from using C to write a java compiler. Or using java to write any other DSL. Unless, and I don't know the answer, there is verbiage somewhere that .class files "must be" run on a certified jvm, then Google is fine.

27 August 2010

Ya Know How Fast You Was Goin', Son?

So yeah, boy, do ya know how fast you was goin'? Turns out, speed isn't everythin'. Just read up on the tests of mid to high-end SSDs. Here's AnandTech's page. And a quote from the Vertex Limited Edition: "Saturating the bandwidth offered by 3Gbps SATA, the Crucial RealSSD C300 and OCZ Vertex LE are the fastest you can get. However, pair the C300 with a 6Gbps controller and you'll get another 70MB/s of sequential read speed." And these are just retail/consumer parts.

I've seen (didn't note the cite, alas) articles stating that enterprise SSDs need to go mano-a-mano with controllers. This is easy to understand.

So, is SPEED the reason to use SSD? Well, of course not. The reason for SSD is BCNF (or higher, dare I submit) datastores. Those ridiculous speed numbers are for "sequential" reads, and sequential only happens (in the physical reality meaning of the word) on bare silicon. And, in any case, SSD won't be price competitive with rotating rust for quite sometime. Both rust and silicon have physical limits, it's just that the silicon limit happens at a much lower volumetric density when used for persistent storage. You're not going to be storing all those 3D dagger-in-your-neck B-movies you just have to have forever on silicon.

It is likely a Good Thing that these teenage SSDs are running into Boss Hogg; they need to find a more meaningful purpose in life. I've got just that.

25 August 2010

Oslo

Once again, the folks at simple-talk have loosened my tongue. The topic is Oslo, or more directly, its apparent demise. I was moved to comment, shown below.

I'll add a more positive note; in the sense of what Oslo could/would be.

I've not had much use for diagramming (UML, etc) to drive schema definition/scripting. Such a process seems wholly redundant; the work process/flow determines the objects/entities/tables, and converting these facts to DDL is not onerous.

OTOH, getting from the resulting schema to a working application is a bear. There have been, and still are, frameworks for deriving a codebase from a schema. Most are in the java world (and, not coincidentally I believe, COBOL some decades past and not the first; it's that ole enterprise automation worship), but without much fanfare. I suspect, but can't prove, that Fortune X00 companies have built internally (or, just as likely, extended one of the open source frameworks) such frameworks.

This is what I thought Oslo was to be: a catalog driven code generator. My obsession with SSD these days (and it's looking more and more like the idea is taking hold Out There) still convinces me that BCNF catalogs can now be efficiently implemented. Since such catalogs are based on DRI, and such kinds of constraints are easily found and translatable, generating a front end (VB, C#, java, PHP, whatever) is not a Really Big Deal. Generating, and integrating, code from triggers, stored procs, check constraints, and the like is a bit more work, but with more normalization, constraints become just more foreign keys, which are more easily translated.

That's where I expected Oslo was headed. This is not an ultimate COBOL objective, but "drudge work" tool for database developers (and redundancy notice for application coders, alas). Such a tool would not reduce the need for database geeks; quite the contrary, for transaction type database projects we finally get to call the tune. Sweet.

And, I'll add here, that the shift that's clearly evident with iStuff and Droids leads to another inevitable conclusion. A truly connected device, and phone devices are surely, means we can re-implement the most efficient transaction systems: VT-100/*nix/database. Such systems had a passive terminal, memory mapped in the server, so that the client/server architecture was wholly local. Each terminal has a patch of memory, and talks to the database server, all on one machine. No more 3270/web disconnected paradigm. With the phone paradigm, the client application can be on the phone, but it has a connection to the server. For those with short memories, or who weren't there in the first place, the client/server architecture was born in engineering. The point was to ship small amounts of data to connected workstations that executed a heavy codebase, not to ship large amounts of data to PCs that execute a trivial codebase. The key is the connection; http is a disconnected protocol, and is the source of all the heartburn. The frontrunners will soon have their fully normalized databases shipping small spoonfuls of data out to iPads/Droids. That's the future.

18 August 2010

The A-Team

A busy day. SanDisk just released news about their latest SSD. How does it fit the point of this endeavor? Read on, MacDuff.

Here's my floating in the clouds (well...) concept of the use of such a device.

Let's say you're running Oracle Financials to a Big Droid, mentioned in that post from earlier today. How does an embedded 64G SSD fit in? How about this: the Big Droid has SQLite installed, talking to that SSD, OF on the linux machine is fully normalized (I've no direct experience with OF, but I'll guess that it's been de(un)-normalized). The Big Problem(tm) with web based applications is the bloat load of data passed over the wire (increasingly virtual wires) from all those fat flat files coders love (I'm talking to you, xml).

Lots of local storage changes that equation. Rather than synthesizing the joined rows on the server, and sending the result set over the wire, we can install SQLite (or similar, SQLite is currently in the Droid stack) on the Big Droid, and send only the normalized rows, letting SQLite store them to a receiving table. SQLite then synthesizes the bloat rows, which the Big Droid App can see and do what it wants with same. After the User makes any changes, SQLite sends back the delta (normalized) rows. Wire traffic drops by a lot, as much as an order of magnitude.

To get really on the edge, Oracle on the linux server could *federate* those SQLite ciients and write to the SQLite tables *directly*. Normalized, skinny tables. Almost no data has to go over the wire. And they once said that Dick Tracy's wrist radio could never happen.

To quote my Hero Hannibal Smith, "I love it when a plan comes together".

[UPDATE]
OK, so perhaps I should have figured that I'm not the first person, although it seemed so since my circle of web sites haven't talked about it, to see that native apps on iStuff/Droid have a natural client/server architecture which can exploit RDBMS on the server (the SSD sort I'm promoting). Native apps, not web/http stuff. So, here's the first article that came up when I let Google do the searching.

The money quote:
In those cases where they actually need to capture data, they require ultra-simple applications that shape the device into a very specific tool that follows the most optimized data capture process possible. Indeed, this is what iPad is good for - it affords developers the opportunity to move the technology aside, replacing it with a shape-shifting experience. Successful data-centric apps will transform the experience and cause the technology to melt away.

Quite some number of posts ago, I made the point that iStuff changes the input paradigm; to picking, not typing. And that picking lends itself (since picking has to be reduced to some manageable number of choices) to normalized data; huge scrolling screens with dozens (hundreds, I've seen) of input fields just won't work. Again, there is existing prior art; the whole host of tablet based ERP modules.

Of course, I've not delved into the SDK's for these devices (don't have a Smart Phone), so it could be that none of my notions is possible. But SQLite is in the Droid stack, so I'd be willing to bet a body part that it is fully doable. Does this sound like it? And this is the framework, also using SQLite on the device.

So, yes, you can do tcp on Android, ignore the skateboard and scroll down to 13 May. Not quite ready for Prime Time, but really, really close; once you've got tcp, ya gots da database. Yummy.

Hannibal was right.

Black, No Cream, No Sugar

The Oracle/Google fight is too interesting not to write about. I've, until now, only contributed to various posts on various blogs, so here's my latest (from an Artima thread), somewhat expanded.

- java *is* Oracle's core, even before the buyout (more later)

- java ME is a bust, but Dalvik is a winner. If you're Oracle why not try to get some of that pie? They'll waste a lot of time and money if Google doesn't settle "Real Soon Now", which I don't they will.

- to the extent that cloud, SaaS, PaaS, WhateveraaS gains mindshare, Oracle either needs to quash it or get a wagon for the wagon train. This attack could accomplish either; Dalvik is made to go away, or Oracle gets it through a free cross-license. I mean, why not run Oracle Financials on a big Droid? Why not? It's not much different, semantically, from OF/*nix/VT-220, just with pixels. Folks run around warehouses today with tablets and WiFi, why not go all the way?

- there was a time when COBOL was the language of the corporation (still is in some parts of some corporations), and there was/is an ANSI standard COBOL, but no one bothered much with it (in the corporation). IBM had its own version, and that runs on its mainframes/minis. Oracle has made java the language of its corporate applications. It might be, they think, a Good Thing if there's Oracle java and some ANSI-java that no one cares about. IBM, unlike M$, forked java in a compliant way, too. If one believes, as I do, that part of the game plan in taking Sun was to build a platform to attack the IBM mainframe business (the last existing fruit on the tree), then having a market dividing stack of Oracle database/java/Sun machines makes some sense; a way to lock-in clients top to bottom.

Larry has always had a strategic view of business; he just wants to have the biggest one. Buying Sun has to be seen in that context. The question observers have to answer: how does buying Sun support that strategy? The knee-jerk reaction was java. Then it was MySql (if you review the initial objections, they related to control of java; only later was MySql considered). Again, the largest part of the existing computing pie that Larry has no part of is mainframe (and IBM has the largest part of mainframe computing as its ever had); I think he wants that, in the worst way. In order to do that, he has to have an alternative. The database is one-third of that. Oracle has been eating DB2's lunch, off mainframe, for years and it keeps getting worse. DB2, thanks to a special codebase just for the mainframe, is the only meaningful database for the mainframe.

To break the cycle, Larry has to have a combine of applications/language/machine which makes a case. Building the Total Oracle Stack(CR) is what he has to do. I've just spent a decade in the financial services industry, and there, the language of choice has become javBOL (or COBava): java syntax used in a COBOL sort of way, largely by COBOL coders re-treaded. DB2 still rules, but with the falling price, and rising power, of multi-core/processor/SSD linux machines (largely on X86 cpu's) Larry has an opening. Those re-treaded COBOL coders are nearing end-of-life, literally. While some number of Indians are conscripted into COBOL to backstop the shortage, none hangs around very long; domestic CS graduates still "won't do the work". But COBOL's days are numbered; there's just too much else to do in CS that's interesting and doesn't require such a stupid language.

Larry can make the case to switch to Oracle applications now that he has a stack, if he can control java.

12 August 2010

The Crucial Difference: Micron Sized

I missed this earlier AnandTech test which is referenced from today's newest. At least, I think so.

This is a long-ish announcement piece, not a test, so we'll have to wait on that. Both the P300 and C300 use Marvell controllers, which are not widely used. The P300 is labeled as Micron, not Crucial. We'll see. The photo isn't even the P300. If it doesn't come with a SuperCap, we can conclude that Micron isn't really serious.

I will say, just having scanned the earlier piece, that anyone who even tests a database with RAID 5 has significant issues with database design and implementation. That koder kiddies will do so is no excuse.

10 August 2010

Take the A Train

Rails and I have a contorted history. I first engaged when I was looking around for schema based code generation tools, and Rails had Scaffolds. Turned out that DHH didn't like Scaffolds, and they kind of disappeared from the Rails landscape, late 1.x time frame.

I've peeked in every now and again since, so today I wandered over here via the Postgres site. I don't yet know whether Mr. Copeland is a database guy at heart, or yet another coder pretending to be one. OTOH, these are his notes from a talk given by two others, who, if the notes are to be believed, haven't drunk the Koder Kool Aid (it really was Flavor Aid, for those who get it). Of particular piquancy:

9:00 One query, 12 joins - complicated, but query time goes from 8 seconds to 60 ms.

20:00 Use constraints, FKs, etc to preserve data integrity - "anything you don't have a constraint on will get corrupted"

42:00 Do analytics in the database. Saw speed improve from 90s to 5s and saved tons of RAM.

1:01:40 Tune PostgreSQL - shared_buffers, work_mem, autovacuum, etc. Rely on community knowledge for initial configuration.

The "Use constraints" one is really, really important. The notion that only the application code should edit input is the wedge issue. Iff the code will only, forever, be the sole user of the data (and you *know* that's baloney) should the application code do it all. And, in that case (presumably because "performance" can only be attained by ignoring RDBMS' services) suck up your cujones and write bespoke file I/O like a real COBOL coder. The RDBMS, modulo vanilla MySql, is going to provide the services anyway. Otherwise, never trust ANY client input. In most cases, never trust ANY client read request (bare sql). The purpose of the client code is to display (perhaps to another program or file, not just a screen) data and pass back data. That's it.

Which brings me to the analytics note. PG is a little short on analytical functions; DB2/Oracle/SqlServer all support SQL-99/03 functions and add more, but use what's there. The same can be said for ETL, too; in most cases sql will get the job done. What the ETL crowd don't get is that the database is closed over its datatypes. There are some syntactic issues going from vendor A to vendor B databases, but the engines are quite capable of transforming from one consistent state to another all on their lonesomes.

One of the knobs I miss from DB2 is the ability to assign bufferpools (DB2's term) at the tablespace level. PG now has tablespaces, but so far as I can see, buffering is at the engine/instance level. Someday.

08 August 2010

Why Don't You Even Write?

Do you recall, while porting the SSD test from DB2 to PostgreSQL (see, I'm even capping as one is supposed to), that I lamented not being able to write out rows to multiple tables in a Common Table Expression?? DB2 can't either, nor have I yet confirmed whether any ANSI level, including 2003, specs it.

But I've just found this presentation for PG. And here's a test of it. Boy-howdy. Finally, a feature that DB2 doesn't have; well soon, maybe. In any case, have a read, it's really cool.

06 August 2010

Mr. Fielding (not of Tom Jones)

Our friends at simple-talk have their weekly (?, thereabouts) interview, with Roy Fielding. He "invented" REST, and has some things to say about it, but this quote made me chuckle, because I've been there, and felt the pain:

You must remember REST is a style and SOAP is a protocol, so they are different. One can compare RESTful use of HTTP to SOAP's use of HTTP. In contrast, I don't know of a single successful large architecture that has been developed and deployed using SOAP. IBM, Microsoft, and a dozen other companies spent billions marketing Web Services (SOAP) as an interoperability platform and yet never managed to obtain any interoperability between major vendors. That is because SOAP is only a protocol - it lacks the design constraints of a coherent architectural style that are needed to create interoperable systems.

SOAP certainly had a lot of developer mindshare, though that was primarily due to the massive marketing effort and money to be found in buzzword-based consulting.

One of the hallmarks, at least to me, about REST is that its verbs (from HTTP) match, to a tee, those of the relational database. Yet, may be for that reason, the procedural goop of SOAP (lots 'o rinsing needed) enveloped the Fortune X00. Oh well, someone will pay the price.

05 August 2010

The End of the Yellow Brick Road?

There is an analysis of the Intel settlement at AnandTech, which talks about Intel, AMD, NVIDIA and how they may, or may not, be getting along swimmingly henceforth. But buried in the discussion is the fact that support for the PCIe bus by Intel was part of the suit and settlement. In particular, Intel is only committed to support the bus through 2016.

The article goes on at some length about NVIDIA not being on the same page as Intel and AMD; the analysis of PCIe support is only in GPU terms. But not just super duper gamers graphic cards use that connector. Fusion-io, among a growing number of others, does too. There have been changes to "standard" disk drive connections over the years, so in one sense this could be viewed as just business as usual in the computer business. On the other hand, will EMC, IBM, and such be as easily convinced that Fusion-io (or whoever) has gone on the right path to SSD implementation? Knowing that all of your storage protocol could disappear at a known point in the future might give one pause.

02 August 2010

A Crossword Puzzle

I thought I would take some time this weekend to stage the ultimate test: the cross-join. Now, the tables I have at my disposal for this test, with enough rows to make it interesting, are Personnel and Dependants. While one would not expect to meaningfully cross-join such tables (excepting data for certain primitive societies/religions), they do serve the direct purpose.

So, for those not familiar, the cross-join (old syntax):

Select count(*) from personnel, dependants

I chose count() simply to remove the screen painting cost from the exercise; I merely want to measure the data cost. There are 1,200,240,012 synthesized rows.

I ran the query against the SSD database, and the HDD database (well, sort of).

The timing for SSD: 452.87 seconds, or about 8 minutes.

The timing for HDD: well, it never finished.

I initially ran both with 5 bufferpools, in order to force hard I/O in both cases. The SSD tables ran just fine. When I ran the HDD tables, it eventually errored out with a bufferpool exhaustion error. So, I increased the bufferpools for the HDD database to 100, and let 'er rip. 3 hours (about) later it errored out with a divide error.

A, somewhat, more fair test might be the cost of a range query between the two structures, that is a PersonnelFlatCross with the billion plus rows versus the normalized tables. If I can get DB2 to load the table, I'll give it a try.

Dr. Codd Was Right

Lisa Murkowski, Swamp Critter

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive