08 April 2009

Encapsulation

I have been looking into Python based web frameworks recently; TurboGears, Pylons, and Django. Each uses an ORM along the way, and the ORM of preference is SQLAlchemy. Of the three frameworks, only TurboGears is being developed with a "reverse" ORM in tow. That being Sprox (nee: DBSprockets).

Sprox doesn't call itself that; I made it up. But it does the reverse to what is found in the framework texts, tutorials, etc. The frameworks use SQLAlchemy to permit the coder to create Python files, which are then run to force DDL into the database. Not what I consider a Good Thing. But, then I've been a database geek for decades; I get database design and specification. While SQLAlchemy is, in my opinion, a better ORM than any of the others I've seen, it doesn't support (nor is it ever likely to) real database design. It's made for Pythonistas who know enough to make trouble.

Sprox sets out to generate a UI from the schema, constraints (some anyway) and all. Data and its constraints of record in one place. Ah, bliss.

But this excursion into the dark side led to a minor epiphany. The OO folk love to talk up encapsulation, isolation of concerns, and other such. They also love to complain that changing the database schema messes up their code, so let's not ever do that once a database gets defined. Of course, coders would never suggest that they, themselves, should never amend their code as needs change. Of course not.

The fact is, industrial strength RDBMS (DB2, Oracle, Postgres, SQLServer) all implement encapsulation and other OO niceties already. The mechanisms are views and stored procedures. The problem is that coders start from the view of odbc/jdbc and, likely, some ancient flat-file derived "relational database". So, they build or acquire some code which, more or less, simplifies the creation of DML in the coding language du jour. DML is just sql, and the odbc/jdbc/cli interfaces are client centric: here's a query, give me back a result set(s) which I'll then iterate through. More often then not, the coders will (especially if they've been exposed to COBOL, ever) read the Master File, then read the Detail File. You get the gist.

With such a simplistic interface, schema/catalog changes cause all sorts of heartburn. But none of that need happen.

Views should be defined for the Object data; that is, the instance data needed to differentiate instances of each Class. This data can be of arbitrary complexity. The base tables in the database are irrelevant to application code; unless there is change to definition of the Class instance data, the client code(r) never knows (or cares) how said data is stored. So, an Order would consist of data from Order, Customer, Address, Order_Line, Inventory, etc. The view would still be called Order, but would be the appropriate join or not. Or not? If the current schema is just some flat-file dump into the RDBMS, then not. But, should there be a refactoring (and would be if smarter heads prevail), the view name is still Order and is still the reference so far as the application code(r) knows, but is now a more or less normalized retrieval. Which can be refactored incrementally, all the while leaving the interface name and data unchanged.

For those RDBMS which support stored procedures that return result sets, SP can be used to fully encapsulate data logic. The call into the database is an SP called, say, GetOrder. The return is the order data. How that data is accumulated is of no concern to the client code(r).

Stored procedures provide the write process encapsulation. The client code(r) calls with necessary parameters, WriteOrder. The stored procedure then figures out where to put the data; this may be some flat-file image in the database or a 5NF decomposition or something in between. The client code(r) neither knows nor cares.

The solid state disc multi-core/processor machine, which is the main subject of this endeavor, is ideally suited to support this approach. Conventional machines, duly buffered, can do much the same.

3 comments:

mike bayer said...

so any database design that does not include a full persistence layer implemented within stored procedures is not a "real database design"? that's quite a statement.

Robert Young said...

It would be, if that were the statement. The piece describes how, if one wishes to enforce encapsulation, and insulate the client code from changes to the schema, views and stored procs will do that. Whether schema changes are really such a big deal that one needs to do so, that is another, crunchier question. One which tends to irritate client coders, since it exposes their ignorance of what Dr. Codd built.

Mature database shops, when confronted with client code (web or otherwise), do tend to enforce access through stored procs, as a matter of security (both as attack vector, and from less than competent application coders; no offence intended, that's just the way it is). The lack of understanding of views, on the part of client coders, is one of the largest gulfs in database systems.

The point of the piece was to describe a solution to a problem. The problem is created by the mindset of OO style web coders: "never change the schema, so let's make it a flat-file dump and do all the work in the application code". This is the way systems were built into the 1970's; the jargon then was of COBOL and copybooks, today it is (mostly) java and data objects . We can do better than that.

Anonymous said...

Ergo, impedance mismatch exists only in coders' heads -- thanks for a great reference article that I can point to.