03 January 2013

Frenemies

There's that old saw: "the enemy of my enemy is my friend". I figured that this was due to Shakespeare, but the wiki says no, the adage originated either in Arabia or China. Makes sense: both cultures were way ahead of England by the time Shakespeare came around.

Each day I get an update from sqlservercentral. They're one part of the Goliath organization which published the Triage piece, so although I'm not currently doing much with SQL Server, it just seemed right. Today's feed included this link: "All Flash". Yhawza!! My little heart goes pit-a-pat. Then I scan the first paragraph, and go off to the link, and these are NoSql kiddies!! Arrgh!

The piece starts off reasonably, making the case that short-stroked arrays of HDD necessary to get the IOPS of a Samsung 840 is orders of magnitude greater than the cost of the 840. A couple of problems with that though. The 840 (not the 840 Pro) is a TLC read-(almost)only part. AnandTech tore it up and then again. While not an egregious part, it isn't by any stretch a server part. Consumer for sure; prosumer not so much.

The piece does get contradictory, however:
Flash is 10x more expensive than rotational disk. However, you'll make up the few thousand dollars you're spending simply by saving the cost of the meetings to discuss the schema optimizations you'll need to try to keep your database together. Flash goes so fast that you'll spend less time agonizing about optimizations.

This is the classic mistake: assuming that flat-file access is scalable. Of course, it isn't with consumer flash drives, and that's why the NoSql crowd find themselves in niche applications. The advantage of the RM, and the subsequent synergy with SSD, is that the RM defines the minimal data footprint. Since random I/O is the norm with multi-user servers, there's no greater penalty to normalization.

When a flash drive fails, you can still read the data.

I don't know where the author gets this from. Since each SSD controller has its own method of controlling data writing on the NAND, unlike HDD which follow standards and largely use Marvell parts (if CMU doesn't bankrupt them), data recovery is iffy. Most failures of SSD to date have been in the controller's firmware, not NAND giving up the ghost, and frequently lead to bricked parts. So, no, you can't remotely depend on simple recovery of data from SSD. While I've not seen definitive proof, SSD failure should be more predictable, which by itself is an advantage. One simply swaps out a drive at some percentage of the write limit. Modulo those firmware burps, you should be good to go until then.

Importantly, new flash technology is available every year with higher durability, such as this year's Intel S3700 which claims each drive can be rewritten 10 times a day for 5 years before failure.

Well, sort of. The S3700's consistency isn't due to NAND durability, but controller magic. It is well known that as geometry has shrunk, inherent durability of NAND has dropped. And that will continue. As I've mused before, we will reach a point where the gymnastics needed to compensate for falling P/E cycles in NAND by controllers will exceed the cost savings of smaller geometries. This is particularly true of the Samsung 840, which begins the article.

Over time, flash device firmware will improve, and small block writes will become more efficient and correct...

It's going the other way, alas. As geometries shrink, page size and erase block size have increased, not decreased. The use of DRAM caching on the drive is the common way to reduce write amplification, i.e. support smaller than page size writes. Firmware can only work around increasing page/erase block size.

What the author misses, of course, is that organic NF relational databases implement minimum byte footprint storage, get you a TPM, lots of DRI, and client agnosticism in the process. So, on the whole, it's a half right article.

No comments: