13 June 2009

Let's Go Dutch

In keeping with the theme of this endeavor, every now and then I return to the basics; which is to say the relational model. Since I'm neither Date nor CELKO, I seek out published authors who have figured it out (agree with me, he he). Today, a Dutch Treat, Applied Mathematics for Database Professionals. The book discusses a formalized model of data, which was first developed by the authors' mentor, Bert de Brock in 1995. As with much that isn't grounded in Flavor of the Month, the ideas took some time to develop, and still remain relevant. Date and Darwen provide a, rather tepid for some reason, forward. I don't quite understand why they aren't explicitly supportive, but there you are.

What de Haan and Koopelaars talk about won't (unfortunately) likely get you that next job doing SQL Server for Wendy's. On the other hand, it will clarify why and how high normal form data structures do what this endeavor seeks: define the most parsimonious data structure which is self-regulating and self-defending. As I have been saying, the SSD/multi machine makes this fully doable in the face of the "joins are too slows" rabble. Following the encapsulation discussion earlier, an existing bloated flat file database can be refactored to proper normal form, and existing code (COBOL, C++, java, etc.) reads from views which replicate the old table structures and writes through stored procedures. No sane code writes directly anyway (modulo 1970 era COBOL, alas).

I won't attempt to rehash the text, but I will give an overview (the link will take you to Amazon, as the eager amongst you have already found). The first part, four chapters, deals with math; logic, set theory, and functions. This is a rather complete treatment, which is welcome. The second part, five chapters, is titled Application, and is the meat of the effort. Here the authors build both the vocabulary of their modeling approach, and the model. It is expressed in the language of the math they set out in part 1, not SQL or the DDL of some particular engine. (That is dealt with in the third part, one chapter, at the end and is Oracle syntax.) I found it quite alien on first reading. The book does demand re-reading, but rewards one with a very clear understanding of what it is possible to do in a data modeling vocabulary.

The model is based on the idea of a database state which is initially correct, and that this state will only be modified into another state which is also correct. The definition of correct is closed world, and the transition process is database centric, not table centric. Years and years ago, I worked with the Progress database/4GL, which was not a SQL implementation (its 4GL was the basis of programming 99.44% of the time) by intent, although it did support SQL. I talked with one of its principle developers, The Wizard, at a conference who observed that a database, if it really is such, is whole. It is not a collection of files/tables; if you can replace any one table at will, what you have is not a database. It just isn't. That was one of the many epiphanies in life. de Haan and Koopelaars take this approach with no remorse. The object of interest is a database state. Refreshing.

The discussion of constraints is more explicit than usual. They describe constraints as tuple, table, and database level. With a high normal form view of data, there will be more tuple constraints than is common is legacy databases.

The last chapter presents the method using Oracle syntax. The most interesting aspect of the chapter is the evolution of what the authors refer to as the Execution Model. There are six of increasing correctness, and are implemented both declaratively and with triggers. The trigger code is Oracle specific, in that other databases, SQL Server for example, define change of state coverage differently; some databases require separate triggers for each of insert, update, and delete, others support multiple actions for a trigger. And Oracle users have to deal with the dreaded mutating table error; DB2 users will not since DB2 provides support for before and after images of rows. But Oracle remains the mindshare leader.

So, 300 pages which will, unless you have a degree (or nearly) in undergraduate math, stretch your understanding of what math there really is in the Relational Model; and how that can be leveraged into far stronger database specifications. Well worth the effort.

No comments: