Our increased sensitivity to these p-values might make us forget what a small part of the p-value spectrum we are talking about here – just 2.5%. At the same time, we are slowly realizing that too many high p-values are rather unlikely. This lead Michael Inzlicht to wonder:
Editor question: After how many p-values <.05 but >.025 should one start getting concerned about robustness? What's fair?
— Michael Inzlicht (@minzlicht) May 26, 2015
Now if someone who is so serious about improving the way he works as Michael Inzlicht wants to know something, I’m more than happy to help.
I wanted to give the best possible probability. Using this handy interactive visualization it is easy to move some sliders around and see which power has the highest percentage of p-values between 0.025 and 0.05 (give it a try = it’s around 56% power, when approximately 11% of p-values will fall within this small section). If we increase or decrease the power, p-values are either spread out more uniformly, or most of them will be very small. Assuming we are examining a true effect, the probability of finding two p-values in a row within 0.025 and 0.05 is simply 11% times 11%, or 0.11*0.11=0.012. At the very best, published papers that simply report what they find will contain two p-values between 0.025 and 0.05 1.2% of the time.
NOTE 1: Richard Morey noted on Twitter this calculation ignores how researchers will typically not run two studies in a row, regardless of the outcome of the first study. They will typically run Study 2 only if Study 1 was statistically significant. If so, we need to calculate the conditional probability that Study 2 found a significant effect between 0.025-0.05, conditional on the probability that Study 1 found a significant effect between 0.025-0.05 (with 56% power). Thus: p(0.025<p<0.05|p<0.05, assuming 56% power). This probability is 21%, which makes the probability across two studies 0,21*0,11=0.023, or 2.3%.
We can also simulate independent t-tests with 56% power, and count how often we find two p-values between 0.025 and 0.05 in a row. The R script below gives us the same answer to our question.