25 April 2009

What I told Bob

(Cringely had another post about Sun/Oracle, which needed reply. Since I don't expect that readers there are readers here, I am providing.)

If you read Gartner, and I only see the PR condensed version, DB2 off-mainframe is falling behind every year since 2000. I, and others, speculated that the reason IBM wanted MySql in the first place was to spruce it up, and call it DB2. What those who’ve never been in a DB2 shop don’t understand is that most DB2 installs on are z/OS, and most of those are running 1970 era COBOL code. In such cases, very little of Relational is ever utilized. MySql as simple sql parser in front of the file system is just all that a COBOL (or java) coder needs. MySql == DB2. That was the plan. Now IBM needs Plan B.

I suspect a bit of whistling past the graveyard in that leaked (horrors, how did that get out!!!!!!!!!!!) email. 750 Power customers? This is a big deal? Those are mainframe numbers. Oracle needed, and now has, a weapon to finally kill DB2 off-mainframe. Like it or not, the Intel multi-core/processor machine is ascendant. The off-mainframe database is where the future lies. IBM cannot possibly want a future of being just another Intel OEM, with customers running Open Source software on same. There is no future there. Oracle has built up a portfolio of software for which there is no easy Open Source alternative.

Remember: Oracle is a MVCC architecture database. DB2 is a locker. SQLServer was a pure locker, and has added MVCC (sorta, kinda) in 2008. Postgres is MVCC and Open Source. This architecture difference is not trivial. Why IBM chose to port IMS to DB2, calling it pureXML, is impossible to fathom. It was not a value add for customers in a web environment. The MVCC architecture is widely agreed to be superior there. IBM has to believe, why I cannot fathom again, that its mainframe machine will dominate the future. Its database off-mainframe is not going to.

Finally, IBM had a compliant bitch in Sun vis-a-vis java. That won’t be the case with Larry.

20 April 2009

Sunrise, Sunset. Game, set, match

Well, the other shoe dropped. Oracle has bid for Sun. In my tracking of speculation, Oracle had more weight than IBM. And so it as turned out. This might end up being a problem for IBM.

IBM has made MySql one of the databases on its iSeries (nee: AS/400). Interesting to see how that goes. The not often mentioned fly in the MySql ointment had been the transactional engine, InnoDB, was own by Oracle for some time, and the putative replacement, Falcon (which wasn't very transactional by design) died aborning.

Assuming this gets past regulatory complaints, and there could be due to the fact that MySql represents a measurable fraction of installed databases. Since it is "free" software, sort of, using common measures such as license fees and similar will make the calculation fuzzy, but a case could be made (and I expect that IBM will make it) that Oracle will control too much of the relational database market. Time will tell.

This is not good for IBM. Following the Gartner reports for the last decade, carefully read, might lead one to conclude that DB2 depends on mainframe installs for its continued existence. With the iSeries moving to use MySql (and we'll see how that goes in future), the Linux/Unix/Windows version may be the red haired stepchild. IBM may conclude that it has no reason to exist. It hasn't made significant inroads against Oracle and SQLServer in the decade. IBM has never been shy about cutting its losses. We may have to wave goodbye to LUW. Sniff.

How would this affect the point of this endeavor? We would be left with just two industrial strength databases on *nix: Oracle and Postgres. Both are MVCC engines, not lockers; does this distinction matter with regard to SSD hosted databases? I think not. While the MVCC approach eats more memory, I don't see that base table storage should be affected. Oracle has Times Ten and IBM has SolidDB in memory databases, so both are working that angle; adapting to SSD should only be a baby step away.

08 April 2009

Encapsulation

I have been looking into Python based web frameworks recently; TurboGears, Pylons, and Django. Each uses an ORM along the way, and the ORM of preference is SQLAlchemy. Of the three frameworks, only TurboGears is being developed with a "reverse" ORM in tow. That being Sprox (nee: DBSprockets).

Sprox doesn't call itself that; I made it up. But it does the reverse to what is found in the framework texts, tutorials, etc. The frameworks use SQLAlchemy to permit the coder to create Python files, which are then run to force DDL into the database. Not what I consider a Good Thing. But, then I've been a database geek for decades; I get database design and specification. While SQLAlchemy is, in my opinion, a better ORM than any of the others I've seen, it doesn't support (nor is it ever likely to) real database design. It's made for Pythonistas who know enough to make trouble.

Sprox sets out to generate a UI from the schema, constraints (some anyway) and all. Data and its constraints of record in one place. Ah, bliss.

But this excursion into the dark side led to a minor epiphany. The OO folk love to talk up encapsulation, isolation of concerns, and other such. They also love to complain that changing the database schema messes up their code, so let's not ever do that once a database gets defined. Of course, coders would never suggest that they, themselves, should never amend their code as needs change. Of course not.

The fact is, industrial strength RDBMS (DB2, Oracle, Postgres, SQLServer) all implement encapsulation and other OO niceties already. The mechanisms are views and stored procedures. The problem is that coders start from the view of odbc/jdbc and, likely, some ancient flat-file derived "relational database". So, they build or acquire some code which, more or less, simplifies the creation of DML in the coding language du jour. DML is just sql, and the odbc/jdbc/cli interfaces are client centric: here's a query, give me back a result set(s) which I'll then iterate through. More often then not, the coders will (especially if they've been exposed to COBOL, ever) read the Master File, then read the Detail File. You get the gist.

With such a simplistic interface, schema/catalog changes cause all sorts of heartburn. But none of that need happen.

Views should be defined for the Object data; that is, the instance data needed to differentiate instances of each Class. This data can be of arbitrary complexity. The base tables in the database are irrelevant to application code; unless there is change to definition of the Class instance data, the client code(r) never knows (or cares) how said data is stored. So, an Order would consist of data from Order, Customer, Address, Order_Line, Inventory, etc. The view would still be called Order, but would be the appropriate join or not. Or not? If the current schema is just some flat-file dump into the RDBMS, then not. But, should there be a refactoring (and would be if smarter heads prevail), the view name is still Order and is still the reference so far as the application code(r) knows, but is now a more or less normalized retrieval. Which can be refactored incrementally, all the while leaving the interface name and data unchanged.

For those RDBMS which support stored procedures that return result sets, SP can be used to fully encapsulate data logic. The call into the database is an SP called, say, GetOrder. The return is the order data. How that data is accumulated is of no concern to the client code(r).

Stored procedures provide the write process encapsulation. The client code(r) calls with necessary parameters, WriteOrder. The stored procedure then figures out where to put the data; this may be some flat-file image in the database or a 5NF decomposition or something in between. The client code(r) neither knows nor cares.

The solid state disc multi-core/processor machine, which is the main subject of this endeavor, is ideally suited to support this approach. Conventional machines, duly buffered, can do much the same.

05 April 2009

Thank You Ted and Steve

There were a couple of interesting developments this past week which bear on the subject of this endeavor.

First, Tony Davis had posted an editorial at SimpleTalk about the Big Deal that had been made about multi-core cpu's, which has subsequently faded; the Big Deal, that is. He went on to observe that for the majority of application developers, and I infer that he means database connected developers since SimpleTalk is a SQLServer based site, parallel programming hasn't been and won't be an issue. Well. A few days into the Editorial's posting (comments are encouraged by the award of a prize for the one deemed Best), Ted Neward, one time servlet maven and now in the M$ camp from what I can see, took it upon himself to post a screed on his web/blog site saying, in general, that what Mr. Davis had written was crap. He asserted, still, that multi-core coding was in the future of coders generally, and that things database were not relevant. The usual client coder bilge.

There ensued a minor sortie on his site from posters of SimpleTalk. It seems to have ended in a draw, with no (as of today) further rebuttal from Mr. Neward. Why this scuffle is relevant here is that one of the postings referenced a New York Times article, which went to some lengths in discussing the nature of cpu's and their future. The driving force is the rise of non-PC devices which connect to some manner of centralized datastore, likely from the Web but not of necessity. These devices use much simpler cpu's, notably of ARM architecture, and are being run on linux increasingly.

The conclusion of Mr. Davis, and most of the comments both at SimpleTalk and Mr. Neward's site, is that we are returning to a world more like the early 1970's, with a centralized computer brain talking to relatively dumb terminal-like devices, than the actively intelligent network envisioned by Mr. McNealy which was just an extension of the client/server architecture that Standford University Network was. While the network might be the computer, it's looking more like Multics every day. Look it up.

The other event of note is also a New York Times article, this time telling the tales of those who struck it rich with iPhone apps. We learn about a handful of winners. We also learn that there are already 25,000 such apps, more every day, and still few winners. On the other hand, it seems to be the venue of choice for those who like to engage in lipsticking. A place for them to go and leave the rest of us alone to do real work. Hopefully, the siren song of (low chance) riches will siphon off many thousands of knuckleheads so that the rest of us can get real work done. Ah bliss.