AERA Editorial Policies regarding Statistical Significance Testing: Three Suggested Reforms 论文
摘要
comments on Thompson (1996), it is ar-gued that describing results as "signifi-cant " rather than "statistically signifi-cant " is confusing to those persons most susceptible to misinterpreting this tele-graphic wording. Contrary to Robinson and Levin's view, it is noted that the util-ity of the characterization of results as being due to "nonchance " is limited by the nature of the null hypothesis assumed to be true. It is suggested that effect sizes are important to interpret, even though they too can be misinterpreted; recent empirical studies of publications indicate that effect sizes are still too rarely reported. Finally, the value of "external " replicability analy-ses is acknowledged, but it is argued that "internal " replicability analyses can also be useful, and certainly are superior to statistical significance tests regarding evaluating result replicability, because sta-tistical significance tests do not evaluate replicability.