A blog on statistics, methods, and open science. Understanding 20% of statistics will improve 80% of your inferences.

Saturday, March 19, 2016

Who do you love most? Your left-tail or your right tail?


TL;DR: Don’t like one-sided tests? Distribute your alpha level unequally (i.e., 0.04 vs 0.01) across two tails to still benefit from an increase in power.


My two unequal tails in a 0.04/0.01 ratio (picture by my wife).



This is a follow-up to my previous post, where I explained how you can easily become 20% more efficient when you aim for 80% power, by using a one-sided test. The only requirements for this 20% efficiency benefit is 1) you have a one-sided prediction, and 2) you want to calculate a p-value. It is advisable to pre-register your analysis plan, for many reasons, one being to convince reviewers you planned to do a one-sided test all along. This blog is an update for people who responded they often don't have a one-sided prediction.

First, who would have a negative attitude towards becoming 20% more efficient by using one-sided tests, when appropriate? Neo-Fisherians (e.g., Hurlbert & Lombardi, 2012). These people think error control is bogus, data is data, and p-values are to be interpreted as likelihoods. A p-value of 0.00001 is strong evidence, a p-value of 0.03 is some evidence. If you looked at your data standing on one-leg, and then hanging upside down, and because of this you will use a Bonferroni-corrected alpha of 0.025 and treat a p-value of 0.03 differently, well that’s just silly.

I almost fully sympathize with this ‘just let the data speak’ perspective. Obviously, your p-value of 0.03 will sometimes be evidence for the null-hypothesis, but I realize the correlation between p-values and evidence is strong enough that it works, in practice, even when it is a formally invalid approach to statistical inferences.

However, I don’t think you should just let the data speak to you. You need to use error control as a first line of defense against making a fool of yourself. If you don’t, you will look at random noise, and think that a high success rate on erotic pictures, but not on romantic pictures, neutral pictures, negative pictures, and positive pictures, is evidence of pre-cognition (p = 0.031, see Bem, 2011).

Now you are free to make an informed choice here. If you think the p=0.031 is evidence for pre-cognition, multiple comparisons be damned, I’ll happily send you a free neo-Fisherian sticker for your laptop. But I think you care about error control. And given that it’s not an either-or choice, you can control error rates and after you have distinguished the signal from the noise, let the strength of the evidence speak through the likelihood function.

Remember: Type 2 error control, achieved by having high power, means you will not say there is nothing, when there is something, more than X% of the time.

Now for the update to my previous post. Even when you want to allow for effects in both directions, you typically care more about missing an effect in one direction, than you care about missing an effect in the opposite direction. That is: You care more about saying there is nothing, when there is something, in one direction, than you care about saying there is nothing, when there is something, in the other direction. That is, if you care about power, you will typically want to distribute your alpha unequally across both tails.

Rice and Gaines (1994) believe that many researchers would rather deal with an unexpected result in the opposite direction from their original hypothesis by creating a new hypothesis, than ignoring the result as not supporting the original hypothesis. I find this a troublesome approach to theory testing. But their recommendation to distribute alpha levels unevenly across the two tails is valid for anyone who has a two-sided prediction, where the importance of effects in both directions is not equal.

I think in most studies people typically care more about effects in one direction, than about effects in the other direction, even when they don't have a directional prediction. Rice and Gaines propose using an alpha of 0.01 for one tail, and an alpha of 0.04 for the other tail.

I believe that is an excellent recommendation for people who do not have a directional hypothesis, but would like to benefit from an increase in power for the result in the direction they care most about.


References

Bem, D. J. (2011). Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407.
Hurlbert, S. H., & Lombardi, C. M. (2012). Lopsided reasoning: On lopsided tests and multiple comparisons. Australian & New Zealand Journal of Statistics, 54(1), 23–42. http://doi.org/10.1111/j.1467-842X.2012.00652.x
Rice, W. R., & Gaines, S. D. (1994). “Heads I win, tails you lose”: testing directional alternative hypotheses in ecological and evolutionary research. Trends in Ecology & Evolution, 9(6), 235–237. http://doi.org/10.1016/0169-5347(94)90258-5

No comments:

Post a Comment