30 March 2020

One Classy Dame - part the fifth

One of the other Science Channel shows I'm addicted to is "Impossible Engineering", which exploits the meme that today's technology is derived from yesterday's. Of course, 'yesterday' is a time span that is greatly fungible. In compute, that might be measured in months. For the case of SCM, a few years might be appropriate. A piece at AnandTech sparked some additional walking through the Yellow Googles, yielding this antique (in compute time) more than a decade ago piece, dealing with SCM and I/O. Fits quite nicely.

The main point being: it makes no sense to treat SCM as a filesystem, full stop. I can get on that train.

Of the, so far as I've so far found, unique benefits of byte-addressable SCM is the impact that can be had on transactions. Universally, again so far as I know, RDBMS implementations limit transactions to the row (some early ones, still extant, set the limit to the page or equivalent). A row is locked as either shared R/W or exclusive W. But, we know that, modulo key columns (and intelligent relational design), non-key columns are independent of each other and therefore independently updateable. Wow!! So that means Jill can update Bill's age and Joe can update Bill's height at the same time and not violate any (assumed) constraints!!! That can't be allowed with row/block level locking/updating, of course. But with a byte addressable data store, there's no reason to prohibit such transactions. The overall result is as it is today: last write wins; the only difference is, no locking delay. The engine has to track unique/primary keys, and prohibit conflicting updates, of course (Jill knows Bill is, and always will be, row 111, so Joe shouldn't be allowed to change Bill's row id to 333 while there's a transaction on Bill's row in flight; but you wouldn't allow update to primary key, now would you?). But, an intelligent database designer wouldn't have multiple unique keys, right?

The main effect of this is to shrink logical transaction scope. Transactions become relevant only to one key and the columns being updated. Now, that key may well span joins, but still, only the re-written columns are impacted. And logging is done behind, as discussed in previous missives. And again, since this is done in byte-addressed persistent memory, time and latch (lock in legacy RDBMS jargon) are vanishingly short. Database coders will need to be retrained, just as an A300 pilot needs a lot of time in a simulator before allowed to fly a B777 for real. Batch transactions may well span minutes or longer, but real time transactions shrink to within row level, and very fast.

No comments: