07 January 2017

Equal and Opposite Reaction

From Newton, his third law:
When one body exerts a force on a second body, the second body simultaneously exerts a force equal in magnitude and opposite in direction on the first body.

We're knee deep in a reactionary socio-political period, and not to ignore the trend some of our programming crowd is moving to a reactive paradigm. Similar word roots, but very different meaning. Reactionary behavior in application development has been on the warpath since Dr. Codd first revealed the relational model, best expressed as COBOL coders of my acquaintance, "we prefer to do transactions in the client"; we don't need no central data control. With java (and other languages, in time), we entered the world of ORMs (I don't think I've run across this piece before, but it sounds like something that'd show up here). Much of the defense of ORM is that there is an impedance mismatch between the RM/SQL and OO. These missives dismissed that bunk years ago, but I felt so alone...

Recently, I came across this rather lengthy discussion. Yes, there is no inherent mismatch at the semantic level. Yes, there is the demand of client coders that all data belongs to them. And, yes, the comments reveal as much about client oriented coders as the text. The point missed, on purpose I expect, by the OOers claiming the IM, is two points:
1 - the database's purpose is to provide the constructor data for the object
2 - the vast majority of java/OO applications don't even have real objects anyway; they disappeared by Y2K (I was there and was sorely disappointed) and were replaced by DataObject/ActionObject (with various names, of course), which are just client-oriented COBOL/VSAM (or C struct) programming in more au courant languages. One need only read Allen Holub's Bank of Allen and his late 90s writings to see what might have been. (Yes, long time reader has seen these links before.)

The end of the preamble of this endeavor is echoed in this comment on the post on reactive programming linked:
#BigPipes are stealthily enmeshing the US and most CSuites (and people in general) are utterly clueless about the data volume thruput implications of a gigabit real-time internet of things.
-- Ed Dodds
Which is good. But then there's this:
Because the more I thought about Reactive the clearer it became that businesses, not just infrastructures, need to act in this way. The age of long term planning is long gone. Circumstances change and they [change] fast.

Taking the reactive metaphor and turning around and defending the olde reactionary code/file paradigm. No thanks.

With innterTubes getting universally fast, as I have called it: the world of host-terminal development has come to the innterTubes; your web browser is just a VT-220 (with pixels) connected to some database someplace else, it's just that RS-232 is now TCP. The disconnected client need no longer be the standard for transactional systems.

What happened? Or rather, hasn't happened. Why hasn't reactive database taken over the world. Let's have a look.

It's been a couple of years now that reactive programming will be the next big thing. There's also been some push that database driven applications would be based on the reactive paradigm. (Espresso Logic, the source of that link was bought up by Computer Associates/CA.) After all, reactive programming is "declarative", which is the essence of the RM/SQL database. Reactive programming is, by most definitions, event driven programming with a push (i.e., in the other direction back to the user machine) requirement: changes in the datastore are sent (immediately?) to all clients holding the data. Think about that for a second. The typical archetype is the computer spreadsheet: change a cell value, and it propagates to any referencing cells (either or both formula/macro), including cascades beyond a simple parent/child structure. Is it really the right basis for a relational database? Not really. Remember ACID?? Yet, may be.

So the spreadsheet is the common example of reactive: a single user, in memory, instantly updating calculator. Does it make any sense to transfer such a paradigm to a multi-user, durable storage, transaction (ACID) engine? Well, no. Nevertheless, a search brings up some lengthy writings on the virtues of reactive database development. Hmmm. How does changing data on the client as buffered data changes on the server keep ACID? Well, such a method destroys ACI. So, while it sounds sexy, not so much.

First, and perhaps most notably, if you read up these links (and many others resulting from a search on 'reactive database programming' and the like) they all set a parent/child frame of reference (hehe) for OO language objects. They're still living in a hierarchical mindset, but I'll credit them with trying to break out of that straitjacket. But the problem remains of focus: the world is truly relations, not hierarchies. Focusing on the notion of reactive forces data relationships out of the hierarchical paradigm, yet the context most often presented remains parent/child. Disconnect.

In database terms, we have the twin twins of force/no-force and steal/no-steal. Which combination you choose determines how reactive your engine is. The most common implementation is steal/no-force, which is reactive: clients would see changes as they happen in buffers if column level locking were supported.

One can note that the MVCC paradigm pioneered by Oracle is opposite to reactive: MVCC isolates client data from changes by other clients. Not such an easy platform for reactive. Unlike fail fast, it's fail last.

A question which shows up on various database forums: "Why can't I have column (within a row) level locking? After all, if I obey the Prime Directive, 'data is about the key, the whole key, and nothing but the key so help me Codd', Jack and Jill changing different (independent, not key) columns on a row at the same time doesn't violate ACID." Even using standard locking or MVCC, last update wins, still. The "NO" answers are that the lock table would be too large for practical purposes, and that the minimum I/O on any OS is rather larger than even the row, the most common lock level in use today. Depending on the machine/OS the smallest I/O is sector/block/page/extent and the like. Commodity 3.5/2.5 inch today are hard formatted to 4K. Mainframe machines these days use such drives, and emulate CKD formatting on top of the hard formatting. Committed data doesn't exist until the engine sends I/O buffer(s) to the OS for write.

For the moment, let's have a suspension of disbelief, and consider how the user screen would behave with a reactive database/application. Taking Jack and Jill as the basis, we'll say that Jack needs to update WEIGHT and Jill needs to update COLOR on row ABC of table FOO, and that COLOR and WEIGHT are dependent only on the key of FOO and don't participate in any mutual constraints. In other words, the database developer has done a proper job of it. If Jill changes COLOR from Red to Blue first, then Jack would see his screen image of row ABC flicker with COLOR morphing. If Jack had only used normal applications, he'd likely freak out. "Demons have infected my PC!!!" But, we'll assume that he'd been told that he's using a new real-time updating application. So, after the first few times his screens update automagically, he'll most likely not even notice. Real-time changes to "other" data rapidly become "just the way the application works". Hell, it's just like Excel!!! Way cool beans.

Can this be done in the real world? Actually, I think so. In memory database means that I/O can be at the byte level (finessing file I/O semantics, of course; some/most engines have the option to use disk in raw mode, skipping file semantics and managing I/O itself, today) with 5NF schemas, which means the row/column level. The lock table issue remains, to some extent, but with 64 bit machines and terabytes of memory and SSD used for virtual memory, maybe not so big a deal.

In the end, yes, a reactive database application is not only possible, but desirable.

No comments: