18 August 2011

Old Frankenstein

In "Young Frankenstein", The Doctor asks Eye-gor (Igor) whose brain he *really* retrieved. Igor replies, "Abby Normal?" I've spent the last hour or so wandering amongst some web sites, blogs, and whitepapers which seek to explain Normal Forms to normal folks; no math, just words.

This one says this: "'Normalization' just means making something more normal, which usually means bringing it closer to conformity with a given standard." Alas, not even close.

Which, since I've been re-reading my probability, stats, and stat pack books and docs, this flipped a switch. Which switch leads to a clearer, albeit slightly mathematical, definition.

I've done a quick search, and can't confirm that he explicitly said so, but given that Dr. Codd was trained as a mathematician, I'll surmise that he used the word in the following sense. In math, two terms are used as synonyms, orthogonal and normal. Remember from geometry class that a 90 degree line is the normal line? It's also orthogonal. Orthogonal as a concept means independence of influence (just as the X axis is independent of the Y axis; there some math), and Codd uses that term liberally in his paper.

So, the normal forms have nothing to do with not insane or seeking standards, but with data independence. Which is normal.


Gary Myers said...

The paper is on the internet in several places under a "A relational model of data for large data banks"


Robert Young said...

Yeah, that was the one I re-read, but not his later work. He uses "normal" without offering an explicit definition, leading me to conclude, given that he describes the goal as data "independence", that he was using it as a math term. Which means orthogonal, which ends up being independence.

And, of course, that's precisely what one gets proceeding to the higher normal forms: greater explicit and structural data independence.

Anonymous said...

I'll go with that. Plus, for the non-insane / standard form, there's the word "canonical".

For example, in complex logical expressions (which can be written in different ways, depending on which variables or sub-expressions are factored out), one can always get down to a sum-of-products canonical form.