24 November 2010

You've Earned a Good Thrashing

One of the aspects of RDBMS on SSD that has been worming around in my lower brain stem for a bit is, what difference does it make whether the engine is locker or MVCC?  Now, for those just joining the partay, my database of preference has been DB2, with SQL Server and PostgreSQL and MySql as adjunctants, for the last decade or so.  DB2 is the last major database sticking to locker semantics; with good reason, so far as I am concerned.  The engine implements a deep locking scheme, and fail fast by the database is smarter than fail late by the user.

The question which as been nagging me is, for MVCC semantics, that data is spread out among both the tables and supporting storage.  In Oracle's case, these are rollback segments.  MVCC, to use my term, is Read Last Committed semantics; and to do that, the engine has to keep track of changes on the fly such that any query can get any rows from any table *as of* some time/commit/transaction (take your pick of term). 

My worry is that garden variety SSD may not be up to storing rollback segments, due to the heavy writing.  On a HDD, it's no big deal.  But this method of storage necessarily slows down the engine.

In looking for answers, I came across this paper from 2008.  You can skip down to slide 22 for the specific discussion.  The paper doesn't present evidence, one way or another, about this concern, but does show that putting version data on SSD is a huge performance win. Here's an update to December, 2009 from the main author. 

So, in all, I haven't found any clear answers from the literature.  What seems clear is that garden variety consumer SSD wouldn't survive (not that I'd ever recommend such parts anyway), and I'm not so sure about prosumer parts.  The STEC's out there, not concerned. 

1 comment:

Anonymous said...

"MVCC semantics" - About 436 results (0.09 seconds)

"locker semantics" - 5 results (0.24 seconds) [two here, three accidental]

"what difference does it make whether the engine is locker or MVCC" - that would be a good name for a blog post, paraphrasing Dave Barry ...

(serious here - you probably can explain something that 99.9% of programmers, myself included, have no idea about, and could learn ...)

(and I'm the guy who knew abut Aiken)