29 March 2011

What's Your Preferred Position?

Regular readers know that I've been talking up the synergy between iPad type tablets and the normalized relational database.  And that such synergy must recognize the nature of input on a tablet.  Little bits of pickable data.  Hors d'oevres, so to speak; not a four course meal.

A few minutes ago Anand posted a quiz.  I guess the question has reached the mainstream pundit class, although it's not quite the right question.  The right question is:  what kinds of data can a keyboard-less device support, and therefore what kinds of applications are best suited to such devices? 

28 March 2011

Mr. Natural Answers Your Questions

A posting on the PostgreSQL/Performance group (having to do with the Intel 320 announcement, which I'll save for a different posting after the dust settles a bit) got me to looking again for published tests of SSD vs. HDD and databases.  As you can see, Dennis Forbes is listed in the links block.  I don't recall whether I've mentioned this post of his before; may haps I have.

But, in some sense related to the Intel 510/320 situation, he makes the salient point (my point since I discovered SSDs years ago) thusly:

Of course NoSQL yields the same massive seek gain of SSDs, but that's where you encounter the competing optimizations: By massively exploding data to optimize seek patterns, SSD solutions become that much more expensive. Digg mentioned that they turned their friend data, which I would estimate to be about 30GB of data (or a single X25-E 64GB with room to spare per "shard") with the denormalizing they did, into 1.5TB, which in the same case blows up to 24 X25-Es per shard.


Of course, his insight is rare, even among those I've read who've been positive about RDBMS/SSD synergy.  It's always been obvious:  normalized datastores are orders of magnitude smaller than their flatfile antecedents.  This smaller footprint comes with all the integrity benefits that Dr. Codd (and Chris Date, et al since) defined.  There are a whole lot of ostriches out in the wild, insisting that massive datastores are needed by their code.  What does it all mean Mr. Natural?  Don't mean shit.

25 March 2011

Shape Shifting

One thing that I really like about O'Reilly books is the Rep-Kover binding; the original better than the current, however.  I find that most computer texts are near interchangeable with respect to content.  It's nearly always marginal, so what matters is ease of use.  For that, Rep-Kover is better than current "hardcover" bindings.  What I tend to dislike about O'Reilly is their (his?) incessant need to create "new" memes in the computing world.  Web 2.0 is, I think, the first; certainly the most infamous so far.

The last few months have seen the aborning of another: Data Science.  This one is even worse, in that it seeks to dumb down a perfectly legitimate pair of professions; statistician and operations researcher.  Long ago, I got involved in ISO-9000 certification, which was another early attempt to dumb down those professions (these days it's Six Sigma, which I had the pleasure to mentor at CSC).  It irritated me then, too.  It's of a piece with DIY neurosurgery, although not as directly deadly.

Yesterday's Forbes on-line version published a story about this newfangled profession, in the context of EMC.  Regular readers may remember that STEC, gorilla of the Enterprise SSD jungle, first touted, then crashed, on its relationship to EMC.  The article whispers that STEC, or whoever is currently supplying, is and will do well.

What's most bothersome about this meme is, as many others have remarked, both math stats and ORs do inferential stats, and inferential stats is based on the math of sampling and inference.  The fact is, one needn't have much training to calculate the parameters of populations.  Fact is, math stats and ORs don't even refer to these numbers as statistics, because they aren't.  It is exactly the same as baseball stats; they aren't stats, just numbers.  But, of course, the meme-sters once again wish to wrap themselves in the blanky of higher math. 

On the other hand, stats as a profession and work product is more interesting than computers.  Even databases, by golly.  May be I'll try to parlay both; the article says that such folks (humble self qualifies) are in demand. 

11 March 2011

You Rook Mahvelous

Just when you've figured it out, the world has a habit of slapping you upside the head.  By now, you've likely heard that Intel has announced its next SSD, the 510.  It's not, explicitly, the X-25/G3.  From the various sources I read, a G3 will be coming along in due time, but is intended to be the "consumer" version, while the newly announced 510 is the "pro" part.

Here's what's puzzling:  as you can see from this AnandTech article, the 510 is biased toward *sequential* processing!  Boy howdy, I never saw that coming.  That, and the fact that the controller isn't home grown, but bought in from Marvell.  The G3 is said to be driven by Intel's controller, but not yet confirmed.

The world has been turned upside down.  Either that, or Intel has completely misread both the technical and buyer worlds.  A sequentially biased SSD makes sense for consumers:  gamers, video processing, and such.  I'm truly puzzled.  The parts aren't big enough in capacity to store anything like the massive files that a file based coder would use.  For the prosumer world that the X25 parts targeted, the 510 just won't be useful, it's barely on par with the X25-G2. 

We'll see.  The 510 still has an advantage over the SandForce drives for compressed/encrypted data, but that's usually things like my beloved relational databases and random processing, not the 510's strength.  Weird.

04 March 2011

32 Heads Are Better Than One

Simple-talk is one of my favorite sites, and now they have what will be a series on parallelism in SQL Server.  This first installment is light and airy.

A few months ago, I got into a bit of a tiff on a Postgres email group when I had the temerity to suggest that query level parallelism is not only a Good Thing, but the only way to maintain performance as we segue from ever faster clocks on single thread/core cpu's to multi-thread/core/processor machines.  That group assembled (I don't recall anyone joining in my defense) asserted that engine level parallelism (doling out queries to threads) was enough.

I've been arguing for years that RDBMSs (not just SQL Server) will be better applications if they're designed to the multi-core/processor/SSD machine.  After all, at least the multi-core part is now fait accompli, so why not?  The beneficial side-effect is that BCNF schemas, with SSD as *primary* storage, are fully feasible.  They are the minimal data (bytes, that is) needed to fulfill demand, and since they are "fully" normalized, DRI implements most if not all of the constraints on the data.  That's been written about here, a bit.

For all the heat that MicroSoft gets, even from me on occasion, they do get databases.  Good on them.

For completeness, here's DB2 docs, rather dry, but then...

And here's Oracle.

Neither is as seamless, at first blush, as SQL Server.  There, I said something nice about a MicroSoft product.

03 March 2011

Something Blue

There's that old saying, "Like father, like son".  The iPad 2 (or is it iPad2??) is out, and AndandTech has a preliminary review up.  In the Conclusion section, there is this:

However, the new iPad does attempt to further blur the line between full computers and tablets, a line that is only going to get blurrier as more Honeycomb tablets invade the market. The iPad still lacks a dedicated keyboard, which will probably always hamper its utility as a content creation device for me, but iMovie and GarageBand join the already existing iWork apps as decent tablet versions of desktop programs.


What is, I guess, slowly dawning on the Pundit Class is that pickable interfaces are necessary for tablet devices.  Such devices can be used as input devices, IFF the data can be displayed in an easily pickable way.  I've always asserted that this simple fact means that a new (set of) widget(s) is needed, along with the recognition that data (as opposed to content) should be normalized at the datastore, i.e. BCNF.  I know that most of the Pundit Class hasn't figured it out yet, but you can't fool Mother Nature.