Dr. Codd Was Right: Schemas Don't Lie, But... Well, You Know

Make America White Again - The Gang of Six, 29 April 2026

[T]hey've got to draw in their horns and stop their aggression, or we're going to bomb them back into the Stone Age. And we would shove them back into the Stone Age with Air power or Naval power -- not with ground forces.
-- Curtis LeMay/1965 [didn't work in Nam, won't work in Iran; these boots are made for walkin]

I've been shot at, spent nights in foxholes filling up with water in the desert. I'm not aware that the president of the United States has ever done any of those things for his country.
-- Scott Pelley/2026 [Bone Spur Samurai^©? more folks are watching "60 Minutes"; the anti-MAGA are winning]

There's not a single example of things working out for the appeaser.
-- Nicolle Wallace/2024 [like this? the next extortion is on the way]

Effective 1 April 2026, by this Executive Order, only attorneys approved by U.S. Attorney General ~~Pam Bondi~~ will be allowed to practice law at the Federal, State, and Local levels. Bar credentials are hereby deleted. Bar associations are eliminated^{told you}. [next step from EO of 7 December 2025]
-- Hate filled, Petty, Paranoid, Demented, Dictator Don/20 March 2026

I have had to explain and re-explain and re-explain and re-explain, you know, how relational databases work, what is an eigenvector, what is dimensionality reduction.

-- Christopher Wylie/2018

I believe quite strongly that, if you think about the issue at the appropriate level of abstraction, you're inexorably led to the position that databases must be relational.

-- Chris Date/2009

This Week's thought

[S]ome new people that are going to be joining us.
-- Bari Weiss/2026 [all Trump Goose Steppers]

See you next week in a brand new show^{©Heckle and Jeckle}

Therefore:

In a time of SSD, multi-core/processor, two terabyte memory and Optane App Direct Mode (RIP) machines, there is no reason not to build from BCNF data. Time to do what Dr. Codd demonstrated. Technology has finally caught up with the maths.

06 December 2013

Schemas Don't Lie, But... Well, You Know

Well, the reactionaries are reacting. To be expected. The Big Lie of NoSql/xml is that such datastores are "schemaless", and thus able to handle any sort of "unstructured" data. RDBMS, on the other hand according to these snake oil salesmen, is trapped in the table/column straitjacket of the RM. The thing about Big Lies: they have to be big in order to flummox the ignorant. An Organic Normal Form™ relational schema is orthogonal by definition nearly immune to side effects from mods (outside the client side code uninterested in the change), while a hierarchy (IMS/xml/HL7/whatever) change propagates at least all the way down. And how does one find related data? The xml zealots are still adding knobs and whistles.

Dave Kellogg, cashiered CEO of MarkLogic (among many) is quoted in an article thus:

I met many Oracle-DBA-lifers during my time working with the government. And I'm OK with their personal decision to stop learning, not refresh their skills, not stay current on technology, and to want to ride a deep expertise in the Oracle DMBS into a comfortable retirement. I get it. It's not a choice I'd make, but I can understand.

Of course, that's bullshit. MarkLogic, initially and for some years, explicitly promoted itself as The xml Database. And, as you know gentle reader, xml is just IMS in, mostly, plain text. xml, by definition, is a hierarchical datastore (when used as such). MarkLogic is just IMS done badly, and perhaps a bit cheaper. Each and every xml definition is a rigid hierarchy. That .xsd (and given the label schema) came along later is part and parcel of the kludge. The xml folks have been sneaking in relationality at least since IDREF, with limited success.

He goes on to say:

It had never occurred to me, for example, that in a $630M project -- where MarkLogic might get maybe $5 to $10M -- that someone would try to blame failure of what appears to be one of the worst-managed projects in recent history on a component that's getting say 1% of the fees.

Not to be all too macabre, but the Challenger o-ring was nowhere near that much of the cost of a shuttle. The point, which Kellogg attempts to redefine out of existence, is not how much of the total bill goes to MarkLogic, but how much of a bottleneck it is. And it is the bottleneck. OLTP systems have no business in NoSql and a colossus of code. That's what today's Kiddie Koders' grandpas did in the 1960s with COBOL/VSAM/IMS. The problem with MarkLogic isn't that retiring Oracle developers are unfamiliar with new tech, but that MarkLogic isn't new tech and is crap for OLTP. The "core database". Yeah right.

The current CEO (well, this week anyway) Gary Bloom, in The Wall Street Journal (of course):

Mr. Bloom said CMS needed to process non-standard data types from multiple vendors.
...
If CMS had elected to use a SQL system, its programmers would have had to build a common data model, or schema, to describe the disparate data sources within the application.

Ok, so what are these "non-standard data types"? Plutostrings and minervaintegers?

So, MarkLogic just automagically slurps up any old byte stream and figures it out? Not really:

The trick is that content not in XML must be normalized; that is, converted to XML. MarkLogic has developed some proprietary methods to perform its data management operations.

In other words: MarkLogic does an ETL exercise, just as any sql RDBMS would do in the same design. Not only that, but any benighted xml data source has to be munged to look "just so" for MarkLogic's structure. This schemaless prattle is just stinky poo. (Although I object to calling MarkLogic's exercise "normalized".) Well, unless the sql developers were smart and federated the databases and read (writing takes a bit more work) from the original. While there are provisions for user defined data types in all the industrial strength RDBMSs, I've not seen them used much. The fact is: one can read a string, an integer, or a float from any foreign database and not have a problem. The principal source of angst would be word size of the machines.

Back to Kellogg:

Oracle was non-standard in 1983. Thirty years later it's too standard (i.e., part of an oligopoly) and not adapted to the new technical challenges at hand. All because some bright group of people wanted to try something new, to meet a new challenge, that cost probably a fraction of what Oracle would have charged, the naysayers and Oracle lifers will challenge it endlessly saying it's "different."

While I agree that Oracle is rapacious, and a fair number of its clients think so too, it's worth noting (as done in an earlier missive) that ten years on, Oracle was a profitable $584 million company. Ten years on and MarkLogic is still sucking at the VC teat; since it's private, we don't know whether there's been any profit, but needing more VC money indicates, not much. While the likes of Kellogg and Bloom and the rest of the xml snake oil peddlers continue to bray about new, disruptive tech; most folks who've been in the real world for a while know the real story. And the real story is that xml, as datastore or data transfer, is a dirty bung hole from the mid-60s.

Oh, and that oligopoly? It's called ANSI. The SQL standard, which each vendor commercial or otherwise amends, extends, and bends to stay "different and better" than vanilla SQL. There's a reason that applications get locked into Oracle or SQL Server or (less often, alas) DB2. Although on IBM's z machines, DB2 is just about the only relational database one can sanely run. Calling ANSI an oligopoly is, to be charitable, mistaken. Were I in a ranting mood, I'd likely say, bloody lie. And their user code (stored procedure language) isn't standard. Yet.

If you're interested in a case study, then here is one. Note that XQuery is an bung hole kludge compared to sql. I guess that's one reason coders like it: a moat of obscurity around their jobs. RBAR's revenge. Just what coders who can't grok simple set theory can rally around.

2 comments:

Anonymous said...: Excellent !
Thank you very much for this post and your blog

Felipe from France.; December 10, 2013 at 8:11 AM
Anonymous said...: Yes, beautifully written piece. I love the Challenger O-Ring analogy. PERFECT.; July 14, 2016 at 12:26 PM

Dr. Codd Was Right

Make America White Again - The Gang of Six, 29 April 2026

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive

06 December 2013

Schemas Don't Lie, But... Well, You Know

2 comments: