08 April 2019

Write Once, Go Home And Sleep [update]

May be it's just me, but the news that Intel has released XPoint as DRAM replacement seems to mean we can go Dr. Codd one better. May be it's just me, but Optane DC Persistent Memory offers the choice to write transactions just once, beyond the cpu caches. Can't we dispense with all those extra tiers? What do we gain with sync-ing all those tiers? What do we gain by eliminating them? Let's go look.

This could also be the nail in the coffin of MVCC, which was devised as a way around locking transaction semantics. The cost, in resources, of MVCC is not trivial. For many years, it's been widely understood that Oracle (and PG, of course) wants the whole machine! And it needs it. With single level storage, the engine has so much less to keep track of. As does the OS.

Yowza! A bit of sending my fingers through the Yellow Googles reveals that others have the same idea.
With Intel Optane DC Persistent Memory, you can put an entire Redis database in-memory and have sub-millisecond latency without the vast expense of doing so with DRAM. This has led Redis CTO and co-founder, Yiftach Shoolman, to say that 'we believe the next-generation server architecture will be all persistent memory. This is going to change the entire database market.'

Sounds like the call for single level storage to me.

Another report confirms that OS twerking will be needed:
Speaking of which, the researchers also benchmarked how the product performed as persistent storage. In this scenario, the Optane DC modules are treated as a storage block device located in memory, while the DRAM fulfills its conventional role as volatile memory. One of the challenges here is that file systems typically do not support memory-based I/O, so options are somewhat limited.
[my emphasis]

Ah, but there is some good news (and expected by Your Humble Servant)
The NOVA-Relaxed variant, (which allows in-place file page writes in order to improve write performance for applications that do not require data consistency for every write) delivered the best performance of the various file system setups.

Ah, but yet again yet more good news
However, the best results were obtained by rewriting three of the applications (RocksDB, Redis, and MongoDB) to use memory mapped Optane DC instead of going through the file system loads and stores. This requires a greater level of effort, since the cache consistency normally provided by the file system has to be re-implemented in the customized software. But as the researchers pointed out, the potential for performance gains are larger.

IOW, to some small delta, single level storage blows everything else away. As the report says, both the database engine and OS have to be twerked to run with a single level storage. But getting there is most of the fun.

[update]
Well, that didn't take long. Yet more fingers wandering through the Yellow Googles finds this paper. You should read it. Among other things, it makes the case that single level storage means RDBMS logging can get out of the way.
The write-ahead logging (WAL) protocol supports efficient transaction processing when memory is volatile and durable storage cannot support fast random writes [58, 39, 33]. But this assumption causes unnecessary performance degradations in a DBMS with NVM storage [14]. Consider a transaction that inserts a tuple into a table. A DBMS first records the tuple's contents in the log, and it later propagates the change to the database. With NVM, a DBMS can employ a logging protocol that avoids this unnecessary data duplication

No comments: