Sadly, the Earth Is Still Round “p < .05”

Thursday, March 18, 2010: 8:45 AM
110 (Convention Center)
Weimo Zhu, University of Illinois - Urbana-Champaign, Urbana, IL
It is almost a century (over actually if we count Karl Pearson's work in 1901) since Ronald Fisher advocated the concept and procedure of hypothesis testing in 1925. Known today as “significance” testing, the hypothesis test is the most widely used decision-making procedure in scientific research. Meanwhile, hypothesis testing has been criticized from the very beginning, mainly for three facets (Berkson, 1938; Cohen, 1990, 1994; Kirk, 1996): (a) hypothesis testing (deductive) and scientific inferences (inductive) address different questions; (b) hypothesis testing is a trivial exercise, to which Tukey (1991) drove home this point when he commented “the effects of A and B are always different—in some decimal place—for any A and B. thus asking ‘Are the effects different?' is foolish”; and (c) hypothesis testing adopts a fixed level of significance (i.e., p<.05 or .01), which forces researchers to turn a continuum of uncertainty into a dichotomous “reject or do-not-reject” decision. Furthermore, since huge sample size can lead to every comparison being “significant,” this makes the word “significant” itself meaningless. Using some real-life examples (e.g., misinterpretation of validity coefficients in validating physical activity measures), the serious consequences of p-value abuse will be described in detail. Fifteen years ago, Cohen (1994) criticized the abuse of p-value abuse as “the earth is round (p<.05).” Yet, the words “significant/significance” are so attractive and researchers often jumped to a “significant” conclusion even if the observed “p<.05” is merely the bias of a large sample size or a meaningless sampling variability. Sadly, while the misuse and abuse of “p<.05” has been well documented in the literature and in many journals' publication guidelines, the inappropriate practices seems to be even more widespread now. Thus, the earth is still round “p<.05”! The possible reasons for this continued inappropriate practice (e.g., inappropriate statistical and research method training) will be examined. Finally, suggested alternatives to hypothesis testing and how to stop the “p<.05” abuse will be outlined.
Previous Abstract | Next Abstract >>