*p*-value is the probability of the observed, or more extreme, data, under the assumption that the null-hypothesis is true. The goal of this blog post is to understand what this means, and perhaps more importantly, what this doesn’t mean. People often misunderstand

*p*-values, but with a little help and some dedicated effort, we should be able explain these misconceptions. Below is my attempt, but if you prefer a more verbal explanation, I can recommend Greenland et al. (2016).

*t*-values on the horizontal axis, and a critical

*t*-value somewhere around 1.96. For a mean difference, the

*p*-value is calculated based on the

*t*-distribution (which is like a normal distribution, and the larger the sample size, the more similar the two become). I will distinguish the null hypothesis (the mean difference in the population is exactly 0) from the null-model (a model of the data we should expect when we draw a sample when the null-hypothesis is true) in this post.

*t*-values. So below, you can see a null-model, assuming a standard deviation of 1, for a

*t*-test comparing mean differences (because the SD = 1, you can also interpret the mean differences as a Cohen’s d effect size).

*t*-values is that you can see how the null-model changes, when the sample size increases.

*p*-value for this observation, we get the probability of observing a value more extreme (in either tail, when we do a two-tailed test) than 0.5.

*p*-values, but before we can do this, we need to introduce a model of the data when the null is

*not*true. When the mean difference is not exactly 0, the alternative hypothesis is true – but what does an alternative model look like?

*p*-values. Before we look at misconceptions in some detail, I want to remind you of one fact that is easy to remember, and will enable you to recognize many misconceptions about

*p*-values:

*p*-values are a statement about the probability of

**data**, not a statement about the probability of a

**theory**. Whenever you see

*p*-values interpreted as a probability of a theory or a hypothesis, you know something is not right. Now let’s take a look at why this is not right.

**1) Why a non-significant**

*p*-value does not mean that the null-hypothesis is true.*p*-value is not smaller than our alpha level, or p > .05). Nevertheless, we see that observing a mean difference of 0.35 is much more likely under the alternative model, than under the null-model.

*p*-value tells us is that this value is not extremely surprising, if we assume the null-hypothesis is true. A non-significant

*p*-value does not mean the null-hypothesis true. It might be, but it is also possible that the data we have observed is more likely when the alternative hypothesis is true, than when the null-hypothesis is true (as in the figure above).

**2) Why a significant**

*p*-value does not mean that the null-hypothesis is false.*t*-test against 0, and this test tells us, with a

*p*< .05, that the data we have observed is surprisingly extreme, assuming the random number generator in R functions as it should.

*p*-value is the probability that the data were generated by chance. Note that this is just a sneaky way to say: The

*p*-value is the probability that the null hypothesis is true, and we observed an extreme

*p*-value just due to random variation. As we explained above, we can observe extreme data when we are basically 100% certain that the null-hypothesis is true (the random number generator in R works as it should), and seeing extreme data once should not make you think the probability that the random number generator in R is working is less than 5%, or in other words, that the probability that the random number generator in R is broken is now more than 95%.

*P*-values are a statement about the probability of

**data**, not a statement about the probability of a

**theory**or a

**hypothesis**.

**3) Why a significant**

*p*-value does not mean that a practically important effect has been discovered.*p*-value changes: It still correctly indicates that, if the null-hypothesis is true, we have observed data that should be considered surprising. However, just because data is surprising, does not mean we need to care about it. It is mainly the verbal label ‘significant’ that causes confusion here – it is perhaps less confusing to think of a ‘significant’ effect as a ‘surprising’ effect (as long as the null-model is realistic - which is not automatically true).

**4) If you have observed a significant finding, the probability that you have made a Type 1 error (a false positive) is not 5%.**

**5) One minus the**

*p*-value is not the probability of observing another significant result when the experiment is replicated.*p*-value, and as a consequence, the

*p*-value can not inform us about the

*p*-value we will observe in future studies. When we have observed a

*p*-value of 0.05, it is not 95% certain the finding will replicate. Only when we make additional assumptions (e.g., the assumption that the alternative effect is true, and the effect size that was observed in the original study is exactly correct) can we model the

*p*-value distribution for future studies.

*p*-value

**does**provide the probability that future studies will provide a significant

*p*-value (even though in practice, we will never know if we are in this very specific situation). In the figure below we have a null-model and alternative model for 150 observations. The observed mean difference falls exactly on the threshold for the significance level. This means the

*p*-value is 0.05. In this specific situation, it is also 95 probable that we will observe a significant result in a replication study,

*assuming there is a true effect as specified by the alternative model*. If this alternative model is true, 95% (1-

*p*) of the observed means will fall on the right side of the observed mean in the original study (we have a statistical power of 95%), and only 5% of the observed means will fall in the blue area (which contains the Type 2 errors).

*p*-value basically never, except for one very specific situation when the alternative hypothesis is true and of a very specific size you will never know you are in, gives the probability that a future study will once again yield a significant result.

**Conclusion**

*p*-value is not intuitive. Grammar is also confusing, and not intuitive. But where we practice grammar in our education again and again and again until you get it, we don’t practice the interpretation of

*p*-values again and again and again until you get it. Some repetition is probably needed. Explanations of what

*p*-values mean are often verbal, and if there are figures, they use

*t*-value distributions we are unfamiliar with. Instead of complaining that researchers don’t understand what

*p*-values mean, I think we should try to explain common misconceptions multiple times, in multiple ways.