Dr. Codd Was Right: P-ing Into The Wind

Lisa Murkowski, Swamp Critter

The world is not linear.
-- Dr. McElhone/1974

Power tends to corrupt; absolute power corrupts absolutely.
-- Lord Acton/1887

Officials who use their public positions for private gain threaten the integrity of our most important institutions. Greed makes governments — at every level — less responsive, less efficient and less trustworthy from the perspective of the communities they serve.
-- Justice Ketanji Brown Jackson/2024 [the MAGAnauts get even more aggressive when their perfidy is exposed]

I think we are on the verge of losing vaccines for this country, from this country. And the reason is that Robert F. Kennedy Jr. will hold up a paper, in the next four or five months, that says it's aluminum in vaccines that are causing a whole swath of problems, including autism. I think he is about to destroy vaccines in this country. I do.
-- Dr. Paul Offit/2025 [may the MAGA and MAHA be with you]

There's not a single example of things working out for the appeaser.
-- Nicolle Wallace/2024 [like this? the next extortion is on the way]

I have had to explain and re-explain and re-explain and re-explain, you know, how relational databases work, what is an eigenvector, what is dimensionality reduction.

-- Christopher Wylie/2018

... but Flash-based storage has such a different performance profile from rotating media, that I suspect that it will end up having a large impact on filesystem design. Right now, most filesystems tend to be designed with the latencies of rotating media in mind.

-- Linus Torvalds/2007

I believe quite strongly that, if you think about the issue at the appropriate level of abstraction, you're inexorably led to the position that databases must be relational.

-- Chris Date/2009

This Week's thought

The atmosphere is like a giant sponge. As the air gets warmer, which is what's been happening because of climate change, the sponge can hold a lot more water. And then when there's a storm, the same sponge can squeeze out way more water than it used to.
-- Arsum Pathak/2025 [may be if physics were a universal requirement in high school, we wouldn't have so many idiots?]

See you next week in a brand new show^{©Heckle and Jeckle}

Therefore:

In a time of SSD, multi-core/processor, two terabyte memory and Optane App Direct Mode (RIP) machines, there is no reason not to build from BCNF data. Time to do what Dr. Codd demonstrated. Technology has finally caught up with the maths.

09 March 2016

P-ing Into The Wind

Thanks, once again, to r-bloggers, an interesting piece floats past my eyeballs. Norm Matlof has his take on a new ASA paper (the link is in his piece), which purports to be a position paper on p-value. I caught it hot off the press, and was moved to open up the comment window and type away. Furiously. When I got to the point of posting, I reneged, mostly because my cavil didn't really amount to much and I wasn't all that interested in taking a pot shot at one who doesn't appear infected by lethal Bayesianitis. Stay away from the zombies.

I returned, since the piece was still on the front page of r-bloggers, and the number of comments was now well beyond 0. I suppose I should look at what others had to say. One comment, much shorter than what I had composed, made the point:

You seem to have misread the ASA statement. They did not reject p-values, merely the misuse and general overemphasis of p-values. p-values are fine, but can't be interpreted in isolation.
-- Clark

Looking, just now, there are further comments, some quite pointed. So far, oddly, none reflecting Bayesianitis.

So, what's so contoversial about p-value? Here is my take.

1:
p-value is "unintuitive". This derives from the Null Hypothesis Significance Test method. All of stat, whether The Frequentist or The Bayesian, came to be in the context of sampling from a population. The count of the sample may be minuscule relative to the population, and the exact (analytic) distribution function of the population may be unknown. Thankfully, the Normal distribution and central limit theorem were devised to allow tractable arithmetic for "large" sample sizes, where "large" really isn't; samples of minuscule size relative to populations were still OK. Count a big enough sample, and all distributions are normal "enough".

The purpose of "statistics", as I was taught in school, was to support inference of population parameters from samples (estimates). A "statistic" was shorthand for "test statistic", and not batting average or QBR or any of the other "advanced analytics" or Big Data manipulations.

NHST was devised as a kind of proof by contradiction exercise: assume that the estimate(s) from the sample data really are accurate to the equivalent population parameter. In order to contradict the assumption, i.e. the estimate is not accurate to the population, then some measure of the estimate vis-a-vis the population will be the decider. That's the p-value. If the calculated p-value meets some specified threshold, then we can reasonably conclude that our sample data couldn't have been drawn from the population. This works either for parameter from a "known" population, or when considering estimates of the same parameter from two samples.

2:
p-value threshold, e.g. .05 is arbitrary, and thus makes p-value analysis bad. Well, no. History and inertia have installed .05 as the standard threshold, but nothing in the math or arithmetic requires it. In the drug field, one occasionally sees both .01 and .1 used.

3:
p-value is commonly misrepresented, i.e. a p-value of .04 is said to mean "there is a 4% chance that the estimate is wrong". All the p-value means is that, for the level of significance chosen, if the value is smaller, then we conclude that the null hypothesis is false. Nothing more.

4:
The Bayesian remains unrequited, and demands that direct statements about the parameters be made from the data. Since the only objective reality is the sample data, and The Frequentist has demonstrated the analytic and computational limits of samples, The Bayesian resorts to fiddling the data with prior knowledge to devise alternative arithmetic. And that way be dragons. The Bayesian is requited, having devised a more satisfactory arithmetic for his purpose, but the fact remains that the sole objective reality is the sample data. The Bayesian hasn't, really, increased the information content of the sample, just imposed external bias to the data. I'm not a fan.

In the end: the sample data is the sole objective reality, and the only real analysis of that data is simple arithmetic. If you can't answer your question with that data, get better data; imputing external data into the analysis merely adds bias.

Dr. Codd Was Right

Lisa Murkowski, Swamp Critter

About

Shameless Plug

Extended Pieces

Good Stuff

Followers

Blog Archive

09 March 2016

P-ing Into The Wind

No comments: