12 November 2016

Who's Driving The Car?

Thanks to simple-talk, I'm informed that Fabian Pascal has posted part two (part one is linked) of his series on "Principle of Orthogonal Database Design". If you read here, you should read this series. I've long mused that the essence of RDBMS is notion of independence in the data. Sometimes this is voiced as "one fact, one place, one time". I like orthogonal; it's a word that makes me giggle and has for years. Along with "batshit", which describes coders who "prefer to do transactions on the client". Totally batshit.

Which raises a question: are triggers as bad or necessary as discussed in Joe Celko's piece, also on simple-talk? One of the commenters describes an, alleged, corner case which demands trigger support. Joe provides an extensive answer, on point.

But I've always thought about such corner cases from the other direction: if the RM is consistent, which it looks so to me, and a SQL database schema is faithful to the RM, which vendors certainly don't enforce, then the answer may be somewhat different. The answer could, likely so, be that the application "logic" which demands the corner case trigger is wrong. Further normalization to reach organic normal form™ is the avenue of escape. After all, the RM is really a logic engine. It's no coincidence that the database language datalog shadows the programming logic language prolog.

Just a thought.


Roboprog said...

Thank God for triggers. Because, you know, applications usually *are* batshit. :-)

Triggers allow fecal-chirops designs such as using a portion of a table as a materialized view by having updates in a second table reflected in the first table. Awesome, right?

Yeah, big time sarcasm.

But seriously, triggers are a necessary evil when some legacy app has painted itself into a corner. Better the DB mopping up for all the client apps than trusting each of them to. But you knew that.

Roboprog said...

OK, this one is more of a "meta" level comment. I'm "only" 51 (which puts me a few years behind you and other voices in the tech community), but I have noticed a definite tendency of older developers to get a bit "cranky".

But not without cause. For me, at least, you get to a point in your mid 40s or so where you see how things could / should be, compared to the "local maxima" that The Management in the IT field wants to stick everybody. It seems that everything is geared around using poor languages, libraries, hardware and staff - thus making everybody work at that level. But look at how fast we run stupid code :-)

I'm thinking of an interview with Alan Kay from about 10 or 15 years ago where he compared the current state of things to the architectural level of the ancient Egyptians - thousands of slaves stacking big stone blocks over a wide base.

In my case, I'm now horrified at how the industry still embraces mutable OOP in the Simula/C++/Java vein as the sin qua non of program architecture, rather than FP now that we have runtimes with good enough garbage collection and faster multicore hardware. It's so much easier to reason about a [directed] *acyclic* graph of data flow than the cyclic mess that current mutable data centric designs make. Once you see it, you can't "un-see" it. (e.g. - modern databases mostly embrace MVCC, as well, to better deal with concurrency, even though applications think they are mutating "the" row in an update)

I guess it didn't help how things were taught, at least in my uni back in the 80s. Many of the key features of something like Lisp were pretty much scoffed at in the classes I took. It took a long time to learn some of this stuff in my spare time decades later. I now get why it was used in the research examples for the AI class I took. Not because it was magic, but simply because it made large tasks manageable.

Anyway, that's my off-topic rant, but I think I get why more experienced devs have to vent about all the insanity happening around them.

Robert Young said...

-- modern databases mostly embrace MVCC ... to better deal with concurrency

I'm on the other side of that fence. while MVCC enables a non-intrusive read concurrency, it violates the long-time mantra of coders (yes, there is at least one aspect of their world I grant): fail fast. all MVCC does, in practice, is to delay collision until it's too late for the client/user to deal with. locker based transaction control does fail fast, such that the user doesn't have to manage undo/redo. and, once again, with a fully normal schema, should we have collisions as a matter of course? if the application doesn't routinely step on the same row from multiple users, the user doesn't see the difference. the system does see a difference; lower resources and quicker transaction closure.

-- FP

the early versions of GW-BASIC had immutable variables so one had
B1 = B + 1
all over the place.

FP ends up with the same issue. rather than trying to keep "state" in the coders head, the coder has to keep the list of "variables" straight in his head. I just don't see the advantage. computing, after all, is about changing values.

the other issue with FP type languages is recursion. Guido rants that Python will never have tail call, but unless a recursive language does, it's useless for industrial uses.

according to the Wiki, anyway, Watson is driven by prolog.

Fabian Pascal said...

A couple of comments:

1. Re independence: The concept of data independence refers to independence of data from applications, in the sense that they should not be biased for some apps and against others -- i.e., they should be in their "natural structure" as Codd expressed it and it usually refers to logical, integrity and physical independence). This is distinct from the Principle of Orthogonal Design (POOD) which refers to the independence of base relations i.e., they are not derivable from one another.

2. While triggered stored procedures are superior to application code, they are inferior to declarative integrity for a variety of reasons that should be obvious.

Fabian Pascal said...

Have you noticed the gap between the rampant ageism in Sillicon Valley vs. the average age of the economic and socio-political elite?

It's important for ouvriers-coders not to be restrained by knowledge, reason and ethics that experience confers, or they may not develop the algorithms intended to exploit and manipulate the plebos.

Robert Young said...

1) point taken.

ageism is the necessary precursor to wheel re-invention and reactionary bomb throwing. as last Tuesday manifested.

Fabian Pascal said...

Well, I am not sure how age affected the election result.

But there is this anecdote about the engineering exam where students were asked to design a pipe transferring 1000 tons of blood from one city to another and none of the students asked why--they all designed it.

There is a point there with respect with ageism. The wheel was useful, what they invent now is the opposite.

Robert Young said...

-- Well, I am not sure how age affected the election result.

Those who do not remember the past are condemned to repeat it.
A child educated only at school is an uneducated child.
Fanaticism consists of redoubling your effort when you have forgotten your aim.

so, ageism, among other things, demeans and ignores the hard won experience of those who've made it through the war thus far. the country folk who voted Trump managed to forget what went on from 2000 to 2008 and were gulled into believing that Democrats were the source of their despair, this from Upton Sinclair:

"It is difficult to get a man to understand something, when his salary depends on his not understanding it."

means that they'll believe clear nonsense. NoSql (another long lost wheel) folk suffer from the same brain cramp.

years ago I worked for a Ph.D. (Iowa State math stat) who spent his youngest years in Nevada and Texas, so had a trove of stories. this is one.

a frog and a scorpion meet at the edge of stream. needless to say, the frog keeps his distance.
scorpion: "I need to get across the stream, and you do too, but I can't swim and you can. can I ride on your back when you go across?"
frog: "no fucking way! you're a scorpion. you sting frogs. I'm not a fool."
they go back and forth for some time. eventually the frog is convinced, so they saddle up and head across the stream. about half way across, the scorpion goes amok and stings the frog repeatedly.
frog: "why?!?!?!? you've killed me, and you can't swim, so you'll drown?!?!?! what the fuck's the matter with you?!?!?!
scorpion: "It's my nature." he drowns.

making stupid personal decisions is one thing. a country committing suicide is rather another.

Roboprog said...

Re: FP & immutability. Yes, I/O (and updates in general) happen, but IMHO such changes should be pushed to the periphery of a system, not wantonly scattered about.

Tail call elimination is a nice to have, but if you have a system that at least discourages mutable data, you can use things like filter, map, fold/reduce that hide "accumulators" when dealing with sets/lists, so you only see "assign-only-once" within each chunk of flow.

Anyway, in my neck of the woods (app land) being able to compose and decompose (immutable) compound values on the fly, and simply look "upstream" to see the nature of the content in most cases would be preferable to having a vast number of effectively global "objects" cast about that might be in any number of states - having received input from any number of sources - when the code accesses them. Or maybe I'm just salty about having to work on too much sludge up to its armpits in "Spring" and ORM. :-)

Granted, some widely used values end up being in an actor / message type component, which has just as much temporal coupling as a global with get/set logic, but at least it relieves Joe Coder from having to remember to put mutexes / locks / syncs around things.

Anyway, that's why I brought up FP in relation to how things shape up in the work I do. Fabian and you did seem to pick up the "experience considered harmful" (now back to stupid on 11) aspect of the discussion, though, which was perhaps the main point.

P.S. - and yes, CODASY^H^H^H^H^H^H NoSQL is not the way forward, kids.

Fabian Pascal said...

You don't have to convince me about ageism.

It is quite telling that there is a huge gap between corporate hiring policies of ouvriers -- those who do the real work --if you reach 45 it's hard to find a job (in SV they hire out of high school) and the age of top management and political elites.

Experience means harder to herd, manipulate and fool.

William Sisson said...

The answer could, likely so, be that the application "logic" which demands the corner case trigger is wrong. Further normalization to reach organic normal form™ is the avenue of escape.

I am not quite sure what you are getting at here. What would the "organic normal form" look like? As I point out in my further post on Simple Talk I am not really satisfied with Celko's solution for a number of reasons that I detail there.

My concern is, that by insisting on solutions that use only use the constraint checking available in SQL you may well end up with some very contrived designs. My argument is that in many cases the trigger solution will be easier to understand and just as declarative.

It is precisely because of "corner" cases - which are many and diverse in nature - that we need to have a way of defining constraints of arbitrary complexity in a declarative manner. Triggers are the nearest we have to this in current SQL systems. I wouldn't argue that they are the correct solution, but on pragmatic grounds I would favour their use.

I think the arguments against triggers seems a bit over to top considering how many other things are wrong with SQL.

Robert Young said...

The Organic Normal Form™ jargon is something I made up. I take it to mean a NF sufficient to support the targeted application(s) solely with DRI. That might be 3NF or 5NF, or whatever. Your objection that NF can't meet all the corner cases (and some code, either on the client or in triggers/procedures/etc. in the database is necessary) might or might not be applicable a priori. But even if true, maximizing logic through DRI is a win/lose proposition. The database and its developer wins. Client coders who get paid by the LoC (even if only implicitly) lose. Since the first days of COBOL, development was driven by the simple premise: define the outputs, then define the inputs needed to support such output. That meant, more or less, a one-to-one congruence between the green bar printout (or, later, a screen) and some file. Data redundancy ruled, but the coder for a report or screen felt better just because s/he needed to deal with only a handful of files (esp. once VSAM was defined); perhaps just that one.

And so it went until Codd's 1969 paper. I'll ignore, for now, IDMS network database, which wasn't much used, and IMS which had just been released when Codd set out to eviscerate it (both were "navigator" driven datastores). Completely. In 1970, hardware wasn't up to the task of implementing NF databases, nor was it when the first Oracle was released in 1979. The COBOL set still thought themselves as the driver of the car, and the data just the passenger (whether file or table or entire schema). The java set, in the Enterprise, are mostly holdover COBOL coders, and continue with that view. Been there, fought that.

I'll not wade into the corner case arguments, except to note that Celko's "Thinking in Sets" chapter 17 deals with them to some degree. YMMV.

As to the provocative bit. The implication is that if the application "logic" can't be supported by a NF database, then may be the application "logic" is illogical! The RM, and SQL databases which don't break the rules, is at its heart a logic engine. If your client coders want the application to do X, but the schema can't be made to do X and remain consistent, may be X is wrong. Not a blanket assertion, but a probability to consider. As I've mentioned a few times, Watson is driven by prolog, and datalog is a prolog geared toward databases; not much supported in the wild, of course. Logic, even now, wins.

And, speaking of probability, now that SS embeds R (and so does Oracle and Postgres and IBM sorta does SPSS), we have two varieties of "logic" living in the same space. The structural logic of the RM, and the dynamic probabilistic logic of correlation. Let the joust begin.

William Sisson said...

As to the provocative bit. The implication is that if the application "logic" can't be supported by a NF database, then may be the application "logic" is illogical!

My example is very specifically data integrity logic. Contracts for the same customer may not have overlapping date ranges and must be contiguous, that is the start date of one contract must be one day after the end date of the previous contract.

This seems like a personally reasonably requirement to me.

A relational DBMS would support declarative constraints of arbitrary complexity, so this constraint could be implemented in a straightforward manner. In SQL it's much more difficult. I'm not saying triggers are right way, just that they are probably the most practical workaround we have at present.