Tuesday, November 2, 2010

Statistics: veneer of mathematical legitimacy


Machiavelli's Laboratory is a free ebook that I published on April 13, 2010. It is a satiric discourse on scientific ethics, from the perspective of an unethical scientist. Please don't take any of the advice and opinions in the book (or the excerpts featured in this blog) seriously.

Statistics should never be mistaken as a branch of mathematics. Statistics is basically just another field that uses mathematics to lend some credibility to their dubious beliefs. Statisticians are the only professionals to use this ploy. Those who focus their life on large quantities of money (economists, financiers, international bankers, investors) have found that mathematics confers a [false] sense of intelligence and trustworthiness.

Methods that use mathematical predictors to analyze aggregate numbers relieve professionals from their responsibility to catolog and understand individual events, while, at the same time, distancing the professional from the consequences of complex events. It's beautiful, really.

Nobody really understands how any statistical test works. None of them are mathematically "provable." If you don't believe me, let's take a close look at null hypothesis testing, the fundamental concept upon which all statistical reasoning is built.

Null hypothesis testing is a probablistic analog of simple logical inferencing. For example, here is a non-statistical situation where a null hypothesis is tested (adapted from Jacob Cohen's brilliant monograph, "The Earth is Round (p<.05)".[1]

  • null hypothesis (the hypothesis you want to prove or disprove): If a person is an American, then he is not a member of Congress.

You start searching through each American, checking to see if he or she is a member of Congress. Eventually, you will find an American who is a member of Congress. At that point, you will know that your null hypothesis is not true, and you will reject it.

When we project null hypothesis testing into the realm of statistics, we reach the opposite conclusion.

  • probabilistic null hypothesis (the hypothesis you want to prove or disprove): If a person is an American, then he is probably not a member of Congress.

Here you can choose a random sampling of Americans (a few hundred or so), and you will almost always find that none of them are members of Congress. You will then accept the probabilistic null hypothesis ("If a person is an American, then he is probably not a member of Congress"). Furthermore, you are now ready to believe the general rule that Americans are not members of Congress.

Of course, the probabilistic null hypothesis is nonsense, as all members of Congress are Americans. The probabilistic null hypothesis is accepted because there are very few members of Congress and there are many Americans. The non-probabilistic version of the same hypothesis was rejected by examining the population to which the hypothesis applied. If we began with an understanding of the rules of fitness for Congress (i.e., a Congressman must be an American), and good data on the relative numbers of Americans and Congressmen, we wouldn't have been suckered into examining a probabilistic absurdity.

Statistical testing is the closest thing the evil scientist has to sleight-of-hand deception. How does an evil scientist know when to believe a statistical conclusion? If the conclusion is consistent with your own biases, and if it furthers your own selfish agenda, you can swear on the results. Otherwise, reject the conclusion by invoking one of four popular classes of statistical error.

Type 1 error. Rejecting the null hypothesis when the null hypothesis is correct (i.e., seeing an effect when there was none).

Type 2 error. Accepting the null hypotheses when the null hypothesis is false. (i.e. seeing no effect when there was one).

Type 3 error. Rejecting the null hypothesis correctly, but for the wrong reason, leading to an erroneous interpretation of the data in favor of an incorrect affirmative statement.

Type 4 error. Erroneous conclusion based on performing the wrong statistical test.

Type 5 error. Erroneous conclusion based on bad data.

REFERENCE

1. Cohen J. The Earth Is Round (p < .05). American Psychologist 49:997-1003, 1994.1



- © 2010 Jules Berman