For example, a two-sample, two-tailed t-test with a standardized effect size of d = 0.5, α = 0.05, and n 1 = n 2 = 64 has a power of 0.8, while the compound probability that three such independent experiments testing the same effect produce significant effects is only 0.8 3 = 0.512. It might seem like there is no harm in being extra conservative, but a very low compound Type I error rate dramatically reduces compound power, that is, the probability that each result is statistically significant given a true non-zero effect. That is, the compound Type I error rate for the set is much smaller than that for the individual experiments. If the null hypothesis happens to be true, the probability of all three experiments producing significant outcomes when using the traditional significance criterion of α = 0.05 is α 3 = 0.000125. To convince peers about the validity of a conclusion, a scientist might plan to run three independent experiments and only conclude that an effect exists if all three experiments produce significant results. We finish with a brief comparison of reverse Bonferroni to some meta-analytic methods.Ĭonsider a scenario where scientists insist that a finding be successfully replicated before they accept the conclusions of an investigation. To help identify the requirements and limitations of the reverse Bonferroni method, we then describe some situations where the method should not be used. We then turn to more complicated situations involving conceptual replications and multiple tests within an experiment. This situation is mathematically straightforward, but is also where meta-analytical alternatives exist. We first explain how reverse Bonferroni applies to direct replications. Here, we further argue that in some situations, scientists should apply a reverse Bonferroni correction in order to maintain an intended Type I error rate. However, as Westermann and Hager ( 1986) noted, the Bonferroni correction is not needed (and is actually conservative) if scientists conclude support for an effect only when all tests produce significant outcomes. When conducting multiple tests, scientists often use Bonferroni correction (dividing the significance criterion by the number of tests see Section 1 of the supplementary material for a derivation) in order to control the probability of finding at least one significant outcome when all the null hypotheses are true. Throughout this manuscript, we refer to the Type I error rate for a decision that is based on multiple test outcomes as the compound Type I error rate. While this cautious approach might seem reasonable, it is important to realize that requiring multiple test outcomes entails a non-intended decrease in the Type I error rate, which further leads to an increased Type II error (not concluding an effect exists when it actually does exist) rate. Due to concerns about misuse of hypothesis testing and questionable research practices (John, Loewenstein, & Prelec, 2012 Simmons, Nelson, & Simonsohn, 2011) many scientists will not conclude support for an effect without first running independent replications to verify the key findings (Cohen, 1994 Nosek, Spies, & Motyl, 2012 Roediger, 2012). The standard approach to null hypothesis testing defines a decision-making procedure that produces a Type I error (concluding that an effect exists when it does not) at a set rate, typically 0.05. To avoid adding to the list of questionable research practices that seem to contaminate some psychological research, we suggest that reverse Bonferroni be restricted to situations where authors pre-register their analysis plans. This reverse Bonferroni approach dramatically improves statistical power and encourages careful planning of statistical analyses prior to data collection. Here, we propose a novel solution to this problem: We show that it is sometimes appropriate to reverse the logic of the classic Bonferroni correction and increase the significance criterion in order to maintain an intended compound Type I error rate across multiple tests. Currently, there is no standard statistical method for dealing with the hyper-conservative error rate and accompanying low power that results from investigations requiring multiple significant outcomes. However, there is also an accompanying drop in power, meaning that the probability of finding support for a true effect is low. This in itself is not a problem since a low Type I error rate is desirable. In particular, the compound Type I error rate for multiple studies (the likelihood of concluding that an effect exists when it does not) can be much lower than that of the individual studies. This practice may seem reasonable, but it has some unintended effects. It is common for conclusions of empirical studies to depend on multiple significant outcomes.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |