Dr. Codd Was Right: NoSql? No Mas! No Mas!

Lisa Murkowski, Swamp Critter

The world is not linear.
-- Dr. McElhone/1974

Power tends to corrupt; absolute power corrupts absolutely.
-- Lord Acton/1887

Officials who use their public positions for private gain threaten the integrity of our most important institutions. Greed makes governments — at every level — less responsive, less efficient and less trustworthy from the perspective of the communities they serve.
-- Justice Ketanji Brown Jackson/2024 [the MAGAnauts get even more aggressive when their perfidy is exposed]

I think we are on the verge of losing vaccines for this country, from this country. And the reason is that Robert F. Kennedy Jr. will hold up a paper, in the next four or five months, that says it's aluminum in vaccines that are causing a whole swath of problems, including autism. I think he is about to destroy vaccines in this country. I do.
-- Dr. Paul Offit/2025 [may the MAGA and MAHA be with you]

There's not a single example of things working out for the appeaser.
-- Nicolle Wallace/2024 [like this? the next extortion is on the way]

I have had to explain and re-explain and re-explain and re-explain, you know, how relational databases work, what is an eigenvector, what is dimensionality reduction.

-- Christopher Wylie/2018

... but Flash-based storage has such a different performance profile from rotating media, that I suspect that it will end up having a large impact on filesystem design. Right now, most filesystems tend to be designed with the latencies of rotating media in mind.

-- Linus Torvalds/2007

I believe quite strongly that, if you think about the issue at the appropriate level of abstraction, you're inexorably led to the position that databases must be relational.

-- Chris Date/2009

This Week's thought

D.O.J. has the full power of the federal government behind it. And under the guise of election integrity, they could end up using their unique tools to introduce new vulnerabilities to the system.
-- Dax Goldstein/2025 [the Office of Data Integrity will leverage all of D.O.J. to steal every election; Paramount just caved to extortion]

See you next week in a brand new show^{©Heckle and Jeckle}

Therefore:

In a time of SSD, multi-core/processor, two terabyte memory and Optane App Direct Mode (RIP) machines, there is no reason not to build from BCNF data. Time to do what Dr. Codd demonstrated. Technology has finally caught up with the maths.

05 September 2016

NoSql? No Mas! No Mas!

First they ignore you, then they laugh at you, then they fight you, then you win.
-- Gandhi

It may be a tad early to gloat, but indications are that the NoSql zealots have waved the white flag and admitted that CAP is silly and doing ACID is way more fun. I suppose they deserve hemorrhoids, too. Couldn't happen to a nicer bunch of folks. A bit of innterTubes searching confirms the occasional tidbit that drifts by: the thought leaders in the NoSql cabal finally admit that transactions and central control over data consistency ain't such an old fashioned idea after all. They've discovered it, and will be patenting it soon. Not that NoSql datastores were any kind of innovation, either. Just VSAM files in ASCII with a buzzy name.

Told ya so.

Here's the main wave from the perpetrator of CAP. At least the principal instigator admits the error. The silly part of the whole episode is that partitions really are rare occurrences. They are just extended latency when they occur. Federated RDBMS, which have been around since about 1990 (the general principles since the mid 80s), have handled the situation. Here's a DB2 tutorial from 2003. The semantics are about the same with other such RDBMS.

As the "CAP Confusion" sidebar explains, the "2 of 3" view is misleading on several fronts. First, because partitions are rare, there is little reason to forfeit C or A when the system is not partitioned.

Fact is, distributed RDBMS (both single and multi- vendor database) existed since at least the early 1990's. And it wasn't just casual; here's a paper on security from Mitre (just down the road from Progress, which supported federation) from 1994. While it's no secret that I'm not a big fan of The Zuck,

Facebook uses the opposite strategy: the master copy is always in one location, so a remote user typically has a closer but potentially stale copy. However, when users update their pages, the update goes to the master copy directly as do all the user's reads for a short time, despite higher latency. After 20 seconds, the user's traffic reverts to the closer copy, which by that time should reflect the update.

So, what's the deal?

Another aspect of CAP confusion is the hidden cost of forfeiting consistency, which is the need to know the system's invariants. The subtle beauty of a consistent system is that the invariants tend to hold even when the designer does not know what they are.

Or, as many RM zealots tell us, high NF schemas reveal facts about data relationships we didn't know before. The schema specifies the invariants, but the data reveals the real world correlations.

Later in the piece, Brewer goes off the deep end:

The essential ATM operations are deposit, withdraw, and check balance. The key invariant is that the balance should be zero or higher. Because only withdraw can violate the invariant, it will need special treatment, but the other two operations can always execute.

This is the PollyAnna view of how banks run ATMs, and transactions generally. His description, and what is assumed by most civilians, is that Your Bank updates Your Account in real time, whether at an ATM or human teller. Not true. Accounts are reconciled (sometimes so as to generate overdrafts!)) in batch at some time EOD. Much of the big money made on bank hacking happens because the perps know that they have hours to do the deed before the accounts used are reconciled. Sometimes the intermediate accounts never see the deed. COBOL cowboys much prefer batch. They've been doing things that way for six decades. BASE has been the default paradigm in banking since forever.

So, with so many cpu cycles, SSD, XPoint, NVRAM, bandwidth, and the like available why would anyone drop OLTP/ACID on purpose? Back in the thrilling days of yesteryear when the 360 and 2311 DASD ruled the world, may be there was no other choice. Times, they are a changin.

[For those that keep track of such things, this musing and its title were started before I saw the adverts for the new Roberto Duran movie.]

3 comments:

Roboprog said...: Having spent a few years working on xBase stuff (an ISAM system for PCs, basically) back in the late 80s, I'm familiar with the headaches of a non-ACID system. I don't know why the kids want to go back to a distributed analog of Berkeley-DB (name-value) or MUMPS (Codasyl-ish) type system, either :-)

Some days I miss the explicit index chaining, vs delving into "what did the query planner decide to do *today*?". I don't miss the indices that missed an update, no rollback, dangling foreign keys, unset columns, etc etc etc.; September 9, 2016 at 12:32 PM
Robert Young said...: DBaseII and III - I hated it
Berkeley-DB - never used it
MUMPS - been around it, and devised at MGH before I lived in the North End

"those who ignore history are doomed to repeat it." that's not the exact quote, which I didn't look up, just my notion of it.; September 9, 2016 at 12:58 PM
Roboprog said...: Hello again, Robert. FWIW, I just stumbled across a pretty good recap of this (in a discussion about PostgreSQL items needing future attention), in which somebody must explain "why" to one of the newbs :-)

https://news.ycombinator.com/item?id=12468920; September 10, 2016 at 2:12 PM

Dr. Codd Was Right

Lisa Murkowski, Swamp Critter

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive

05 September 2016

NoSql? No Mas! No Mas!

3 comments: