06 December 2013

Schemas Don't Lie, But... Well, You Know

Well, the reactionaries are reacting. To be expected. The Big Lie of NoSql/xml is that such datastores are "schemaless", and thus able to handle any sort of "unstructured" data. RDBMS, on the other hand according to these snake oil salesmen, is trapped in the table/column straitjacket of the RM. The thing about Big Lies: they have to be big in order to flummox the ignorant. An Organic Normal Form™ relational schema is orthogonal by definition nearly immune to side effects from mods (outside the client side code uninterested in the change), while a hierarchy (IMS/xml/HL7/whatever) change propagates at least all the way down. And how does one find related data? The xml zealots are still adding knobs and whistles.

Dave Kellogg, cashiered CEO of MarkLogic (among many) is quoted in an article thus:
I met many Oracle-DBA-lifers during my time working with the government. And I'm OK with their personal decision to stop learning, not refresh their skills, not stay current on technology, and to want to ride a deep expertise in the Oracle DMBS into a comfortable retirement. I get it. It's not a choice I'd make, but I can understand.

Of course, that's bullshit. MarkLogic, initially and for some years, explicitly promoted itself as The xml Database. And, as you know gentle reader, xml is just IMS in, mostly, plain text. xml, by definition, is a hierarchical datastore (when used as such). MarkLogic is just IMS done badly, and perhaps a bit cheaper. Each and every xml definition is a rigid hierarchy. That .xsd (and given the label schema) came along later is part and parcel of the kludge. The xml folks have been sneaking in relationality at least since IDREF, with limited success.

He goes on to say:
It had never occurred to me, for example, that in a $630M project -- where MarkLogic might get maybe $5 to $10M -- that someone would try to blame failure of what appears to be one of the worst-managed projects in recent history on a component that's getting say 1% of the fees.

Not to be all too macabre, but the Challenger o-ring was nowhere near that much of the cost of a shuttle. The point, which Kellogg attempts to redefine out of existence, is not how much of the total bill goes to MarkLogic, but how much of a bottleneck it is. And it is the bottleneck. OLTP systems have no business in NoSql and a colossus of code. That's what today's Kiddie Koders' grandpas did in the 1960s with COBOL/VSAM/IMS. The problem with MarkLogic isn't that retiring Oracle developers are unfamiliar with new tech, but that MarkLogic isn't new tech and is crap for OLTP. The "core database". Yeah right.

The current CEO (well, this week anyway) Gary Bloom, in The Wall Street Journal (of course):
Mr. Bloom said CMS needed to process non-standard data types from multiple vendors.
If CMS had elected to use a SQL system, its programmers would have had to build a common data model, or schema, to describe the disparate data sources within the application.

Ok, so what are these "non-standard data types"? Plutostrings and minervaintegers?

So, MarkLogic just automagically slurps up any old byte stream and figures it out? Not really:
The trick is that content not in XML must be normalized; that is, converted to XML. MarkLogic has developed some proprietary methods to perform its data management operations.

In other words: MarkLogic does an ETL exercise, just as any sql RDBMS would do in the same design. Not only that, but any benighted xml data source has to be munged to look "just so" for MarkLogic's structure. This schemaless prattle is just stinky poo. (Although I object to calling MarkLogic's exercise "normalized".) Well, unless the sql developers were smart and federated the databases and read (writing takes a bit more work) from the original. While there are provisions for user defined data types in all the industrial strength RDBMSs, I've not seen them used much. The fact is: one can read a string, an integer, or a float from any foreign database and not have a problem. The principal source of angst would be word size of the machines.

Back to Kellogg:
Oracle was non-standard in 1983. Thirty years later it's too standard (i.e., part of an oligopoly) and not adapted to the new technical challenges at hand. All because some bright group of people wanted to try something new, to meet a new challenge, that cost probably a fraction of what Oracle would have charged, the naysayers and Oracle lifers will challenge it endlessly saying it's "different."

While I agree that Oracle is rapacious, and a fair number of its clients think so too, it's worth noting (as done in an earlier missive) that ten years on, Oracle was a profitable $584 million company. Ten years on and MarkLogic is still sucking at the VC teat; since it's private, we don't know whether there's been any profit, but needing more VC money indicates, not much. While the likes of Kellogg and Bloom and the rest of the xml snake oil peddlers continue to bray about new, disruptive tech; most folks who've been in the real world for a while know the real story. And the real story is that xml, as datastore or data transfer, is a dirty bung hole from the mid-60s.

Oh, and that oligopoly? It's called ANSI. The SQL standard, which each vendor commercial or otherwise amends, extends, and bends to stay "different and better" than vanilla SQL. There's a reason that applications get locked into Oracle or SQL Server or (less often, alas) DB2. Although on IBM's z machines, DB2 is just about the only relational database one can sanely run. Calling ANSI an oligopoly is, to be charitable, mistaken. Were I in a ranting mood, I'd likely say, bloody lie. And their user code (stored procedure language) isn't standard. Yet.

If you're interested in a case study, then here is one. Note that XQuery is an bung hole kludge compared to sql. I guess that's one reason coders like it: a moat of obscurity around their jobs. RBAR's revenge. Just what coders who can't grok simple set theory can rally around.


Anonymous said...

Excellent !
Thank you very much for this post and your blog

Felipe from France.

Anonymous said...

Yes, beautifully written piece. I love the Challenger O-Ring analogy. PERFECT.