09 March 2016

P-ing Into The Wind

Thanks, once again, to r-bloggers, an interesting piece floats past my eyeballs. Norm Matlof has his take on a new ASA paper (the link is in his piece), which purports to be a position paper on p-value. I caught it hot off the press, and was moved to open up the comment window and type away. Furiously. When I got to the point of posting, I reneged, mostly because my cavil didn't really amount to much and I wasn't all that interested in taking a pot shot at one who doesn't appear infected by lethal Bayesianitis. Stay away from the zombies.

I returned, since the piece was still on the front page of r-bloggers, and the number of comments was now well beyond 0. I suppose I should look at what others had to say. One comment, much shorter than what I had composed, made the point:
You seem to have misread the ASA statement. They did not reject p-values, merely the misuse and general overemphasis of p-values. p-values are fine, but can't be interpreted in isolation.
-- Clark

Looking, just now, there are further comments, some quite pointed. So far, oddly, none reflecting Bayesianitis.

So, what's so contoversial about p-value? Here is my take.

1:
p-value is "unintuitive". This derives from the Null Hypothesis Significance Test method. All of stat, whether The Frequentist or The Bayesian, came to be in the context of sampling from a population. The count of the sample may be minuscule relative to the population, and the exact (analytic) distribution function of the population may be unknown. Thankfully, the Normal distribution and central limit theorem were devised to allow tractable arithmetic for "large" sample sizes, where "large" really isn't; samples of minuscule size relative to populations were still OK. Count a big enough sample, and all distributions are normal "enough".

The purpose of "statistics", as I was taught in school, was to support inference of population parameters from samples (estimates). A "statistic" was shorthand for "test statistic", and not batting average or QBR or any of the other "advanced analytics" or Big Data manipulations.

NHST was devised as a kind of proof by contradiction exercise: assume that the estimate(s) from the sample data really are accurate to the equivalent population parameter. In order to contradict the assumption, i.e. the estimate is not accurate to the population, then some measure of the estimate vis-a-vis the population will be the decider. That's the p-value. If the calculated p-value meets some specified threshold, then we can reasonably conclude that our sample data couldn't have been drawn from the population. This works either for parameter from a "known" population, or when considering estimates of the same parameter from two samples.

2:
p-value threshold, e.g. .05 is arbitrary, and thus makes p-value analysis bad. Well, no. History and inertia have installed .05 as the standard threshold, but nothing in the math or arithmetic requires it. In the drug field, one occasionally sees both .01 and .1 used.

3:
p-value is commonly misrepresented, i.e. a p-value of .04 is said to mean "there is a 4% chance that the estimate is wrong". All the p-value means is that, for the level of significance chosen, if the value is smaller, then we conclude that the null hypothesis is false. Nothing more.

4:
The Bayesian remains unrequited, and demands that direct statements about the parameters be made from the data. Since the only objective reality is the sample data, and The Frequentist has demonstrated the analytic and computational limits of samples, The Bayesian resorts to fiddling the data with prior knowledge to devise alternative arithmetic. And that way be dragons. The Bayesian is requited, having devised a more satisfactory arithmetic for his purpose, but the fact remains that the sole objective reality is the sample data. The Bayesian hasn't, really, increased the information content of the sample, just imposed external bias to the data. I'm not a fan.

In the end: the sample data is the sole objective reality, and the only real analysis of that data is simple arithmetic. If you can't answer your question with that data, get better data; imputing external data into the analysis merely adds bias.

No comments: