24 November 2010

You've Earned a Good Thrashing

One of the aspects of RDBMS on SSD that has been worming around in my lower brain stem for a bit is, what difference does it make whether the engine is locker or MVCC?  Now, for those just joining the partay, my database of preference has been DB2, with SQL Server and PostgreSQL and MySql as adjunctants, for the last decade or so.  DB2 is the last major database sticking to locker semantics; with good reason, so far as I am concerned.  The engine implements a deep locking scheme, and fail fast by the database is smarter than fail late by the user.

The question which as been nagging me is, for MVCC semantics, that data is spread out among both the tables and supporting storage.  In Oracle's case, these are rollback segments.  MVCC, to use my term, is Read Last Committed semantics; and to do that, the engine has to keep track of changes on the fly such that any query can get any rows from any table *as of* some time/commit/transaction (take your pick of term). 

My worry is that garden variety SSD may not be up to storing rollback segments, due to the heavy writing.  On a HDD, it's no big deal.  But this method of storage necessarily slows down the engine.

In looking for answers, I came across this paper from 2008.  You can skip down to slide 22 for the specific discussion.  The paper doesn't present evidence, one way or another, about this concern, but does show that putting version data on SSD is a huge performance win. Here's an update to December, 2009 from the main author. 

So, in all, I haven't found any clear answers from the literature.  What seems clear is that garden variety consumer SSD wouldn't survive (not that I'd ever recommend such parts anyway), and I'm not so sure about prosumer parts.  The STEC's out there, not concerned. 

18 November 2010

Food Fight

I do so love it when the Kiddies finally figure out databases; well a couple and a little bit.  This PHP post came across my bow, and it's just too much fun not to pass on.  Thing is, the blogger looks to be a bit of a young-un.  And he takes a good deal of heat from the ridgebrows, but doesn't back down.  Good for him.  There is hope for us all.

16 November 2010

An Open and Shut Case

There was an OpenSQL camp up in Cambridge last month, and I considered going, but the agenda listed far too many NoSQL projects.  I decided that I'd just spend a long weekend being irritated.  Today I read Josh Berkus's write up on lwn.net, and this was the one nugget:

Some of the SQL geeks at the conference discussed how to make developers more comfortable with SQL. Currently many application developers not only don't understand SQL, but actively hate and fear it. The round-table discussed why this is and some ideas for improvement, including: teaching university classes, contributing to object-relational mappers (ORMs), explaining SQL in relation to functional languages, doing fun "SQL tricks" demos, and working on improving DBA attitudes towards developers.

There are times (quite often, truth be told) that I wish the relationalists had the gonads of the Right Wingnut Zealots or Tea Baggers or what-have-you.  The RM isn't just another data store.  By attempting to make nice with folks who refuse to listen, you'll just tick off the folks who do get it, but won't convince those who've no intention of being swayed.  Use relational databases to make better systems than the knuckleheads who belittle them.  Don't get mad (well, a bit some times), get even. 

If you go and read the piece, you'll find the attendees worrying about problems that the commercial vendors (with lots more folks, of course) dealt with years, if not decades, ago.  Some of the problems are fully discussed in textbooks; Weikum & Vossen in particular.  There Ain't No Such Thing As A Free Lunch.  If you want minimal byte footprint, maximum structural integrity, minimum modification hassle, then the RM as embodied in current industrial strength RDBMS's is the way to go.  Open Source databases can do the same, so long as they concentrate on implementing the fundamentals, and stop worrying about pandering to the FOTM programming language.  Languages will come and go (COBOL and java relegated to Big Business), but the data is forever.  Best put it some place safe.

09 November 2010

Convicts and Cane

If you're of a certain age, or were precocious at a young age, you may be familiar with the following lyric: "In the early part of this century, convict labor worked the cane fields on the bottoms of the Brazos river... Go down old Hannah don't you rise no more, if you rise in the morning, bring the judgment day".

I've no idea whether the Thought Leaders at AMD are familiar with old folk songs, but they've labeled its latest Intel beater Brazos.  The good folks at AnandTech have some details.  While this is a notebook implementation, and not especially pertinent to this endeavor, the graph on page one surely is.  I gather it represents AMD's view of machine development over the next years, and that AMD will have victory on judgment day.

That last curve, for what AMD calls Heterogeneous-Core Era, is Brain Viagra.  "Abundant data parallelism" is what the SSD/BCNF database is all about.  Another step along the Yellow Brick Road.  It will be a fun journey.

Ahhhhhhhhh. The Stay Puft Man

I once knew a man, father of a friend of mine, who remarked that one of his sons "costs me a lot of money"; private school and all that.  Applications designed to be difficult to maintain are a lot like prodigal sons, the money just seems to fly out the window.  Coders remain adamant that their API's are what make software easy to maintain.  Baloney.  The proliferation of code, much as it was in the 1960's (when COBOL was going to make application development so easy, a manager could do it), is justified on the grounds that the latest New Thing in coding will make all the angst go away.  Hasn't happened, now has it?  Perhaps we should stop looking to bloatcode for the answer.

SQL Server Central has an article on maintenance.  I was moved to post a reply, herewith entered for your approval.  Another opportunity to ring the bell for SSD/BCNF systems.

In the world of commercial/business software, aka database systems, the answer to maintenance costs is to embrace SSD/BCNF.  Why, you may ask?  Let me count the ways.

1) by putting the data and its integrity logic in one place, the server, letting the client code be responsible only for screen painting and data input; one small group of smart database geeks keeps the data under control.  compare to the human wave approach of client-side coding.  in fact, embracing SSD/BCNF means the data is utterly agnostic to the client code.  could not possibly care less whether it's a java screen, or VB screen, or csv file.  makes no difference.  I offer xTuple as an example; not yet a SS application, not that this matters much.

2) by embracing SSD/BCNF, maintenance amounts to adding columns/tables/rows (a row is a business rule, remember) to the schema.  the data hangs together on the RM. 

3) by embracing SSD/BCNF, client side code can be generated from the schema.  not saying it has to be, or should be (well, yeah, it should), but it can be.  at the least, clients should interact through SP.

4) with significantly (and soon to be, massively) parallel servers available for small bucks, what's the most adept application for such machines?  well, the relational database engine, of course.  client code, not so much; as client coders are discovering.

5) by embracing SSD/BCNF, the byte footprint is an order of magnitude less than it is with the flat-file storage so beloved by C#/java/VB/COBOL coders.

6) as Celko (at least) has written (in "Thinking in Sets"), using auxiliary tables to implement constraints makes maintenance still simpler:  just add (or delete) rows to update constraints.  for that matter, authorized users can update check constraints and the like from screens; such constraints are just text stored in the catalog.

That should do it for now.  Remember, the cost of maintenance is *directly* a function of the code/data structure.  The more obscure that structure, the higher the cost.  Historically, for those that have been paying attention, coders view life (largely because those that employ them are dumb enough measure them so) as a LOC exercise.  Anything which increases the LOC future is good; likewise, anything which decreases LOC future is bad.  Those that employ them often take the same view, though few will admit it.  The reason is that such organizations are inherently bureaucratic, and in that environment the one with the deeper org chart gets more money ("I manage 5 managers and 100 staff, you've only got 3 and 50").  Efficiency and productivity really aren't the goal.  The hardware and software to solve the issue, in the commercial world, has existed for decades, yet the COBOL/VSAM paradigm persists; only the syntax has changed.  That's not an accident.  The RM and RDBMS are actively opposed in many shops just because fewer coders would be needed, to do all that maintenance that CIO's complain about; which fact keeps the CIO's org chart growing.  Hmmm.  Curious.

06 November 2010

Larry, Larry Quite Contrary

Larry, Larry quite contrary, how does your fortune grow?  No silver bells or core contributions, that's how.  People are such knuckleheads; perpetual Charlie Browns, expecting the football to always be there. 

Regular readers may remember this musing where I made the case that Oracle considered MySql a threat, and would do something about it.  The EU was right.  Here's the latest.  Larry is also reining in java.  My thought here is that he'd just as soon do the same to java as MySql:  a crippled "Open Source" version, and a pay-through-the-nose not so Open version.  Might go so far as make it into the Oracle Language.

Who's going to stop him?  Perhaps IBM, also heavily in invested in java use, will take over the OS version.  They'd have to either fork or prop up Harmony; they've not shown any inclination for either move, so far.  Are the Armonkers dumb enough not to see that Larry is after their mainframe business?  Time will tell, but it sure looks like it so far.

01 November 2010

Beam Me Up, Scotty

Another tidbit from an Artima thread.

Carlos wrote:
Someone wrote an academic paper a few years ago advocating exactly this. They showed that software designed around the idea it may be arbitrarily killed at any time was more reliable, shut down more quickly and had a host of other benefits.

And I responded:
They're called industrial strength database engines. Not trivial to write.

In general, however, the AJAX-ian migration is the attempt to recreate a connected database application, aka VT-100/RS-232/*nix/Oracle. With a phone architecture, we have that. A connected architecture will always outperform a disconnected one, HTTP for example. Managing state goes away, since the datastore always *is* the state. With said datastores on SSD, data control relegated to the server becomes a Good Thing; while the client (phone, pad, whathaveyou) just does painting, input collection, and transfer.

I remain convinced that we're headed back to bound data grids, what was once considered a MicroSoft horror (data must be loosely coupled, and all that).  As well it might be, per se; but the architecture is superior from both a user experience and data integrity point of view.  One fact, one place, one time is fulfilled.  Again, it's only a matter of sufficient bandwidth, and your phone/pad/thingee is just a pixelated VT-100 (and a RS-232 Cat-5 wire) connected to a database.  Once you've reached that point, there's nothing to be gained from retreating.  We only did the web as we did because it started on 56Kb dialup, and that was fast if you could get it (14.4K was not unusual; do any of you actually have experience with BBS's and the nascent web in that circumstance?).  A connected web was not envisioned, thus HTTP and the like.  For better or worse, most folks are always connected, and mostly do trivial stuff with the facility.