*If you have educational material that you think will do a better job at preventing p-value misconceptions than the material in my MOOC, join the p-value misconception eradication challenge by proposing an improvement to my current material in a new A/B test in my MOOC. *

I launched a massive open online course “Improving your statistical inferences” in October 2016. So far around 47k students have enrolled, and the evaluations suggest it has been a useful resource for many researchers. The first week focusses on p-values, what they are, what they aren’t, and how to interpret them.

Arianne
Herrera-Bennet was interested in whether an understanding of *p*-values was
indeed “impervious to correction” as some statisticians believe (Haller &
Krauss, 2002, p. 1) and collected data on accuracy rates on ‘pop quizzes’
between August 2017 and 2018 to examine if there was any improvement in p-value
misconceptions that are commonly examined in the literature. The questions were
asked at the beginning of the course, after relevant content was taught, and at
the end of the course. As the figure below from the preprint shows, there was clear improvement, and accuracy rates were quite high for 5
items, and reasonable for 3 items.

We decided
to perform a follow-up from September 2018 where we added an assignment to week
one for half the students in an ongoing A/B test in the MOOC. In this new assignment,
we didn’t just explain what p-values are (as in the first assignment in the
module all students do) but we also tried to specifically explain common misconceptions,
to explain what *p*-values are not. The manuscript is still in
preparation, but there was additional improvement for at least some
misconceptions. It seems we can develop educational material that prevents
*p*-value misconceptions – but I am sure more can be done.

*p*-value is the correctly used

*p*-value” I write:

“Looking at
the deluge of papers published in the last half century that point out how
researchers have consistently misunderstood *p-*values, I am left to wonder:
Where is the innovative coordinated effort to create world class educational
materials that can freely be used in statistical training to prevent such misunderstandings?
It is nowadays relatively straightforward to create online apps where people
can simulate studies and see the behavior of p values across studies, which can
easily be combined with exercises that fit the knowledge level of bachelor and master
students. The second point I want to make in this article is that a dedicated
attempt to develop evidence based educational material in a cross-disciplinary
team of statisticians, educational scientists, cognitive psychologists, and
designers seems worth the effort if we really believe young scholars should
understand p values. I do not think that the effort statisticians have made to
complain about *p-*values is matched with a similar effort to improve the way
researchers use p values and hypothesis tests. We really have not tried hard
enough.”

If we
honestly feel that misconceptions of *p*-values are a problem, and there are
early indications that good education material can help, let’s try to do all we
can to eradicate *p*-value misconceptions from this world.

We have
collected enough data in the current A/B test. I am convinced the experimental
condition adds some value to people’s understanding of *p*-values, so I think it
would be best educational practice to stop presenting students with the control
condition.

However, I there might be educational material out there that does a much better job than the educational material I made, to train away misconceptions. So instead of giving all students my own new assignment, I want to give anyone who thinks they can do an even better job the opportunity to demonstrate this. If you have educational material that you think will work even better than my current material, I will create a new experimental condition that contains your teaching material. Over time, we can see which materials performs better, and work towards creating the best educational material to prevent misunderstandings of p-values we can.

If you are
interested in working on improving *p*-value education material, take a look at
the first
assignment in the module that all students do, and look at the new second
assignment I have created to train away misconception (and the answers).
Then, create (or adapt) educational material such that the assignment is
similar in length and content. The learning goal should be to train away common
*p*-value misconceptions – you can focus on any and all you want. If there are
multiple people who are interested, we collectively vote on which material we
should test first (but people are free to combine their efforts, and work
together on one assignment). What I can offer is getting your material in front
of between 300 and 900 students who enroll each week. Not all of them will
start, not all of them will do the assignments, but your material should reach
at least several hundreds of learners a year, of which around 40% has a masters
degree, and 20% has a PhD – so you will be teaching fellow scientists (and
beyond) to improve how they work.

I will
incorporate this new assignment, and make it publicly available on my blog, as
soon as it is done and decided on by all people who expressed interest in
creating high quality teaching material. We can evaluate the performance by
looking at the accuracy rates on test items. I look forward to seeing your
material, and hope this can be a small step towards an increased effort in improving
statistics education. We might have a long way to go to completely eradicate
*p*-value misconceptions, but we can start.

Daniel-

ReplyDeleteThanks as always for your work. I don’t have a lesson of my own to offer, but I did have a comment on a small part of your first assignment that I think could be problematic.

On page 2 of the posted version of lesson 1.1, you write about the first figure “There is a horizontal red dotted line that indicates an alpha of 5% (located at a frequency of 100.000*0.05 = 5000)”. But that seems like a confusing or misleading statement. First, since the line indicates a Y value, it must be a frequency of observed outcomes for p; a line showing alpha would have to indicate an X value. And even given that the line indicates the expected frequency of outcomes, it's the expectation *under the null hypothesis*, which is not explained here. More importantly, though, even if you do mean that the line will show the expected height of the bars under the null hypothesis, the only reason that you can use N*0.05 to predict that height is that you’ve divided the distribution into 20 bars - it’s not because alpha is 0.05. If you’d chosen to divide the graph into increments of 0.01 (as you do later), the height of the red line would be N*0.01 despite alpha being 0.05 (but now there would be five bars in the alpha region instead of just one). So the height of the line is based on the number of divisions, not alpha.

Does that critique make sense? I can try to explain more fully if not.

Cheers,

Alistair