02 August 2010

A Crossword Puzzle

I thought I would take some time this weekend to stage the ultimate test: the cross-join. Now, the tables I have at my disposal for this test, with enough rows to make it interesting, are Personnel and Dependants. While one would not expect to meaningfully cross-join such tables (excepting data for certain primitive societies/religions), they do serve the direct purpose.

So, for those not familiar, the cross-join (old syntax):

Select count(*) from personnel, dependants

I chose count() simply to remove the screen painting cost from the exercise; I merely want to measure the data cost. There are 1,200,240,012 synthesized rows.

I ran the query against the SSD database, and the HDD database (well, sort of).

The timing for SSD: 452.87 seconds, or about 8 minutes.

The timing for HDD: well, it never finished.

I initially ran both with 5 bufferpools, in order to force hard I/O in both cases. The SSD tables ran just fine. When I ran the HDD tables, it eventually errored out with a bufferpool exhaustion error. So, I increased the bufferpools for the HDD database to 100, and let 'er rip. 3 hours (about) later it errored out with a divide error.

A, somewhat, more fair test might be the cost of a range query between the two structures, that is a PersonnelFlatCross with the billion plus rows versus the normalized tables. If I can get DB2 to load the table, I'll give it a try.

No comments: