Comments on The 20% Statistician: After how many p-values between 0.025-0.05 should one start getting concerned about robustness?

Hi Daniel, Mickey's question and your answer...

2015-05-27T00:27:50.533+02:00

Hi Daniel,

Mickey's question and your answer suggest another way to examine bias.

It is similar to TIVA.

https://replicationindex.wordpress.com/2014/12/30/the-test-of-insufficient-variance-tiva-a-new-tool-for-the-detection-of-questionable-research-practices/

Both tests are based on the insight that test-statistics (whether they are presented as z-scores, p-values, post-hoc power, or other transformations) should vary considerably. Obtaining p-values that are too close to each other suggests that bias is present.

The difference between TIVA and the critical region approach (Neyman-Pearson) is that TIVA does not require an a priori specification of the critical region. If Mickey would always use .05 to .025, the approach is fine. However, if the critical region is not fixed, the bias test itself is biased.

The problem with .05 to .025 is that it is very narrow. This reduces the type-I error rate very much (even with k=2, p = .01), but the type-II error rate is high because p-hacking doesn't always produce p-values just below .05 as you posted on another blog.

Thus, the trick is to find a good balance between type-I and type-II error. For two studies, I suggest a range from 50% to 80% power, which corresponds to z-scores of 1.96 to 2.8, and p-values from .05 to .005.

The type-I error rate for this test with k = 2 is about 10%, which is considered acceptable to just raise awareness of bias. This test has more power for k = 2 than TIVA. This makes it appealing to use it for pairs of studies.

Sanjay. To get the probability that the data are b...

2015-05-27T00:08:19.838+02:00

Sanjay. To get the probability that the data are biased given the observation of a pair of p-values between .05 and .025, we have to make some assumptions about the probability of this event to occur when bias is present. Does 50% seem reasonable to you? In this case, the probability that bias is present when the red flag is raised would be 50 out of 51, or 98%, a little bit less than 99 out of 100 (99%).

Maybe you want to be more conservative. with 25% probability of bias producing the event, there are still 25 out of 26 events where bias produced the critical event (96% correct positive rate).

Bayesians often trick us by using a medical analogy where the event we are looking for is very low (brain cancer).

Both p in .05-.025 One p not in .05-.025
Bias 50 50
NoBias 1 99

My interpretation of Mickey's question, slight...

2015-05-26T23:42:32.262+02:00

My interpretation of Mickey's question, slightly paraphrased, is: "Given that I have observed two p-values between .025 and .05, what is the probability of them coming from an unbiased report?"

On the other hand, the calculations in this blog post (such as the 6%) are asking, "Given an unbiased report, what is the probability of observing two p-values between .025 and .05?"

I'd just like to point out that these are not the same thing. It's a reversal of the conditional probabilities.