03 July 2009

Cloud: Lucy in the Sky with Razorblades ... Updated, twice

"'The time has come,' the walrus said, 'to talk of many things'"

Today I endeavor to check off one of the musings in process, dealing with silos in the sky. I am motivated by a bit of bloviating I ran across in my surfing, this nonsense. I will leave it to readers to endure it on their own. I will only pull out the occasional quote to gloat.

A bit of background is in order. With the arrival of Web 2.0, and to a lesser extent Web .00001, coders began to change the rules back to what existed in the 1960's: all code is smart and all data is dumb. Java became COBOL, and data became copybooks. The main problems are that the coders are dishonest about their desires and intentions, and the effect of what they are attempting will eventually lead to the problems which led Dr. Codd to devise the relational model and database; he is not responsible for SQL, Chamberlin is the guilty party there and he continues to sin with XQuery and the like.

In the 1960's there developed the industry segment known as Service Bureaus. IBM was a major player. A service bureau was a connected computer service, generally over leased lines, to which a company could off-load its applications. Often, applications were provided by the service bureau. The service bureau agreed to provide resources as needed.

SOA and cloud are just http versions of service bureaus. Service bureaus fell out of favor for the obvious reasons: they didn't actually manage to provide resources on demand, they weren't any more reliable than in-house, they weren't any more (often quite less) secure than in-house, and they didn't save their customers any money. After all, they had to make a profit doing what had been a simple cost for their clients. There didn't turn out (surprise, surprise) to be any economies of scale. There won't be for SOA or cloud, either. That is no surprise.

The notion of provisioning in the cloud being cheaper and more scalable is founded on a single, false, assumption. That is: demand spikes for resources among the clients are uncorrelated, or that gross demand for all clients is either constant or monotonically increasing. That the assumption is false is easy to fathom; there are easily identifiable real world generators of demand spikes, and few are unique to either specific companies or industries. Daily closing hour, Friday peaks, weekend peaks, seasonal peaks, month end peaks, and so on. The argument is made that the resource demands made by the release of the latest iPhone (as example) are transitory to Apple, the retail sites, and AT&T. What is ignored is that all such organizations, unless they are failing, will absorb these resources in their continuing (growing) business in short future.

The real failing of SOA/cloud will be imposition of lowest-common-denominator data structures. Which brings us back to that execrable post. Since these NoSQL folk have it in their heads that "all data be mine", just as COBOL programmers did in the 1960's, they will build silos in the sky, just as their grandpappies built silos in glass rooms (most likely don't even know what a glass room is, alas). As that old saying, those that ignore history are doomed to repeat it.

So, some quotes.

"'Relational databases give you too much. They force you to twist your object data to fit a RDBMS [relational database management system],' said Jon Travis, principal engineer at Java toolmaker SpringSource..."

Mr. Travis fails to understand that object data are separate and apart from object methods. Always were, and always will be. The reason OODBMS failed was just for that reason. There is nothing to be gained, and a lot of pain to be endured, from storing all that method text more than once. The instance data, stored relationally which is the minimum cover (so to speak) of the data requirement, identifies each instance of the class. There is no "twisting" to be done. The relational model keeps all the data nice and neat and constrained to correctness; no code needed. The method text is needed only for transitory changes from one stable, correct state to the next. Each state being written back to the relational database. Simple as that. But that means far less code, which coders view as a job threat. Yes, yes it is.

"'SQL is an awkward fit for procedural code, and almost all code is procedural,' said Curt Monash, an independent database analyst and blogger."

Mr. Monash is infamous in the database world, and proves it again. The OO world, it claims anyway, is explicitly non-procedural. It is event driven, through message passing. So it says. In any case, how the object's data is manipulated after construction is of no concern to the data store, be it a RDBMS or shoe box full of 3x5 cards. All such data manipulation is transitory and irrelevant to state. State is what the data store cares about. Mr. Monash doesn't get that.

Siloing, or rather the riddance of same, was Dr. Codd's concern. These KiddieKoders are such ignorant reactionaries, and don't even know it. The title of his first published paper: "A Relational Model of Data for Large Shared Data Banks" puts it all together. No siloing. The cloud, with these primitive one-off data structures, create the massive data problem, not solve it.

The serious problem with the cloud, however, is that, if widely adopted, it may kill off the advantage of the point of this endeavor, the SSD multi-machine database. SSD storage has the advantage of supporting high normal form, and thus parsimonious, data structures. If cloud becomes ascendant, it will adopt lowest-common-denominator storage, i.e. cheap. And SSD multi-machines are not that. I believe it can be shown that SSD storage will be faster (with a smaller footprint) for high normal form databases than for the de-normalized spreadsheets so beloved of KiddieKoders. But cloud providers aren't going to be that smart. Back to the 60's.

I'll close with a quote from those I keep around as sigs for e-mails:

This on-demand, SaaS phenomenon is something I've lived through three times in my career now. The first time, it was called service bureaux. The second time, it was application service providers, and now it's called SaaS. People will realise the hype about SaaS companies has been overblown within the next two years. ... People are stupid. History has shown it repeats itself, and people make the same mistakes.
-- Harry Debes/2008

I check up with Gartner PRs every now and again; I don't spend the coin to subscribe. Well, they still haven't released the database survey (always looking to see how much DB2 has fallen behind), but there is a survey concerning SaaS that, surprise surprise, reinforces my thesis. Gartner is no better at prognosticating than Madam Zonga, but they do surveys as well as anybody. Have a look.

Update 2:
Another Gartner-ism from a newer report on cloud specifically:
"As cloud computing evolves, combinations of cloud services will be too complex and untrustworthy for end consumers to handle their integration, according to Gartner, Inc. Gartner predicts that as cloud services are adopted, the ability to govern their use, performance and delivery will be provided by cloud service brokerages."

In other words, cloud will need yet a new infrastructure to make this really, really cheap and easy off-loading of responsibility possible. Have these folks already forgotten the clustered duck that was CORBA? God almighty.


Anonymous said...

I agree with everything in your post ... everything.

Curt Monash said...

In a skim of your post, I find it full of errors, from the trivial (their/there or crediting only one of Codd and me with a PhD) to the more substantive. The latter include but are not limited to:

1. There are plenty of reasons to expect economies of scale in the cloud other than load balancing, notably amortization of various kinds of costs (people, security, administrative consoles, "learning curve", etc.) over larger numbers of processors.

2. Not everybody who disagrees with you is dishonest.

3. Not everybody who disagrees with you forgets the past. When I first was an analyst, there was no such thing as DB2.

4. Words have more than one meaning. I can think of at least three significant uses of "nonprocedural". Most well-meaning, suitable knowledgeable people would surely recognize which one was meant in my quote.

And by the way, I favor SQL for most database programming.


Robert Young said...

As to title: the article didn't cite you as Dr. Monash, and I didn't recall. My apologies. I will endeavor to remember should I need in future.

As to 1): perhaps, but not a priori. The complexity of the provisioner's site could just as easily mean that overhead will be a wash.

As to 2): the point was not on disagreement, but to their argrument, which is that smart code/dumb data (with, implicity, a purpose built TPM, or worse, none at all) is a better way of building systems. It isn't a better way, as history as shown, just different; their point is purely about hegemony, wrapped in an argument of efficiency.

As to 3): those who argue against the relational model/database without citing how its services will be provided in their alternative, is forgetting the past. That's just the way it is.

As to 4): If anyone can cite a study (or implementation) which provides the services of an industrial strength RDBMS, without SQL (or other internal data sublanguage), I would be happy to discuss it. All of my readings have revealed that the NoSQL folk take the tack of not only eleminating the syntax and the services. Sort has to happen that way. If you drop some or all of ACID, then the "database" will run faster, since it is providing fewer services. That is not surprising. And you end up with not a "large shared data bank", but a siloed data dump. QED.