Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction 论文

2006The British Journal for the Philosophy of Science引用 271
Philosophy and History of ScienceBayesian Modeling and Causal InferenceStatistical Mechanics and Entropy

摘要

Despite the widespread use of key concepts of the Neyman–Pearson (N–P) statistical paradigm—type I and II errors, significance levels, power, confidence levels—they have been the subject of philosophical controversy and debate for over 60 years. Both current and long-standing problems of N–P tests stem from unclarity and confusion, even among N–P adherents, as to how a test's (pre-data) error probabilities are to be used for (post-data) inductive inference as opposed to inductive behavior. We argue that the relevance of error probabilities is to ensure that only statistical hypotheses that have passed severe or probative tests are inferred from the data. The severity criterion supplies a meta-statistical principle for evaluating proposed statistical inferences, avoiding classic fallacies from tests that are overly sensitive, as well as those not sensitive enough to particular errors and discrepancies. 1. Introduction and overview1.1Behavioristic and inferential rationales for Neyman–Pearson (N–P) tests 1.2Severity rationale: induction as severe testing 1.3Severity as a meta-statistical concept: three required restrictions on the N–P paradigm 2. Error statistical tests from the severity perspective2.1N–P test T(α): type I, II error probabilities and power 2.2Specifying test T(α) using p-values 3. Neyman's post-data use of power3.1Neyman: does failure to reject H warrant confirming H? 4. Severe testing as a basic concept for an adequate post-data inference4.1The severity interpretation of acceptance (SIA) for test T(α) 4.2The fallacy of acceptance (i.e., an insignificant difference): Ms Rosy 4.3Severity and power 5. Fallacy of rejection: statistical vs. substantive significance5.1Taking a rejection of H0 as evidence for a substantive claim or theory 5.2A statistically significant difference from H0 may fail to indicate a substantively important magnitude 5.3Principle for the severity interpretation of a rejection (SIR) 5.4Comparing significant results with different sample sizes in T(α): large n problem 5.5General testing rules for T(α), using the severe testing concept 6. The severe testing concept and confidence intervals6.1Dualities between one and two-sided intervals and tests 6.2Avoiding shortcomings of confidence intervals 7. Beyond the N–P paradigm: pure significance, and misspecification tests 8. Concluding comments: have we shown severity to be a basic concept in a N–P philosophy of induction?

相关事件

暂无数据

相关文章

暂无数据