10 August 2010

Take the A Train

Rails and I have a contorted history. I first engaged when I was looking around for schema based code generation tools, and Rails had Scaffolds. Turned out that DHH didn't like Scaffolds, and they kind of disappeared from the Rails landscape, late 1.x time frame.

I've peeked in every now and again since, so today I wandered over here via the Postgres site. I don't yet know whether Mr. Copeland is a database guy at heart, or yet another coder pretending to be one. OTOH, these are his notes from a talk given by two others, who, if the notes are to be believed, haven't drunk the Koder Kool Aid (it really was Flavor Aid, for those who get it). Of particular piquancy:

9:00 One query, 12 joins - complicated, but query time goes from 8 seconds to 60 ms.

20:00 Use constraints, FKs, etc to preserve data integrity - "anything you don't have a constraint on will get corrupted"

42:00 Do analytics in the database. Saw speed improve from 90s to 5s and saved tons of RAM.

1:01:40 Tune PostgreSQL - shared_buffers, work_mem, autovacuum, etc. Rely on community knowledge for initial configuration.


The "Use constraints" one is really, really important. The notion that only the application code should edit input is the wedge issue. Iff the code will only, forever, be the sole user of the data (and you *know* that's baloney) should the application code do it all. And, in that case (presumably because "performance" can only be attained by ignoring RDBMS' services) suck up your cujones and write bespoke file I/O like a real COBOL coder. The RDBMS, modulo vanilla MySql, is going to provide the services anyway. Otherwise, never trust ANY client input. In most cases, never trust ANY client read request (bare sql). The purpose of the client code is to display (perhaps to another program or file, not just a screen) data and pass back data. That's it.

Which brings me to the analytics note. PG is a little short on analytical functions; DB2/Oracle/SqlServer all support SQL-99/03 functions and add more, but use what's there. The same can be said for ETL, too; in most cases sql will get the job done. What the ETL crowd don't get is that the database is closed over its datatypes. There are some syntactic issues going from vendor A to vendor B databases, but the engines are quite capable of transforming from one consistent state to another all on their lonesomes.

One of the knobs I miss from DB2 is the ability to assign bufferpools (DB2's term) at the tablespace level. PG now has tablespaces, but so far as I can see, buffering is at the engine/instance level. Someday.

No comments: