Saturday, October 17, 2020

The p-value misconception eradication challenge

If you have educational material that you think will do a better job at preventing p-value misconceptions than the material in my MOOC, join the p-value misconception eradication challenge by proposing an improvement to my current material in a new A/B test in my MOOC.

I launched a massive open online course “Improving your statistical inferences” in October 2016. So far around 47k students have enrolled, and the evaluations suggest it has been a useful resource for many researchers. The first week focusses on p-values, what they are, what they aren’t, and how to interpret them.

Arianne Herrera-Bennet was interested in whether an understanding of p-values was indeed “impervious to correction” as some statisticians believe (Haller & Krauss, 2002, p. 1) and collected data on accuracy rates on ‘pop quizzes’ between August 2017 and 2018 to examine if there was any improvement in p-value misconceptions that are commonly examined in the literature. The questions were asked at the beginning of the course, after relevant content was taught, and at the end of the course. As the figure below from the preprint shows, there was clear improvement, and accuracy rates were quite high for 5 items, and reasonable for 3 items.


 

We decided to perform a follow-up from September 2018 where we added an assignment to week one for half the students in an ongoing A/B test in the MOOC. In this new assignment, we didn’t just explain what p-values are (as in the first assignment in the module all students do) but we also tried to specifically explain common misconceptions, to explain what p-values are not. The manuscript is still in preparation, but there was additional improvement for at least some misconceptions. It seems we can develop educational material that prevents p-value misconceptions – but I am sure more can be done. 

In my paper to appear in Perspectives on Psychological Science on “The practical alternative to the p-value is the correctly used p-value” I write:

“Looking at the deluge of papers published in the last half century that point out how researchers have consistently misunderstood p-values, I am left to wonder: Where is the innovative coordinated effort to create world class educational materials that can freely be used in statistical training to prevent such misunderstandings? It is nowadays relatively straightforward to create online apps where people can simulate studies and see the behavior of p values across studies, which can easily be combined with exercises that fit the knowledge level of bachelor and master students. The second point I want to make in this article is that a dedicated attempt to develop evidence based educational material in a cross-disciplinary team of statisticians, educational scientists, cognitive psychologists, and designers seems worth the effort if we really believe young scholars should understand p values. I do not think that the effort statisticians have made to complain about p-values is matched with a similar effort to improve the way researchers use p values and hypothesis tests. We really have not tried hard enough.”

If we honestly feel that misconceptions of p-values are a problem, and there are early indications that good education material can help, let’s try to do all we can to eradicate p-value misconceptions from this world.

We have collected enough data in the current A/B test. I am convinced the experimental condition adds some value to people’s understanding of p-values, so I think it would be best educational practice to stop presenting students with the control condition.

However, I there might be educational material out there that does a much better job than the educational material I made, to train away misconceptions. So instead of giving all students my own new assignment, I want to give anyone who thinks they can do an even better job the opportunity to demonstrate this. If you have educational material that you think will work even better than my current material, I will create a new experimental condition that contains your teaching material. Over time, we can see which materials performs better, and work towards creating the best educational material to prevent misunderstandings of p-values we can.

If you are interested in working on improving p-value education material, take a look at the first assignment in the module that all students do, and look at the new second assignment I have created to train away misconception (and the answers). Then, create (or adapt) educational material such that the assignment is similar in length and content. The learning goal should be to train away common p-value misconceptions – you can focus on any and all you want. If there are multiple people who are interested, we collectively vote on which material we should test first (but people are free to combine their efforts, and work together on one assignment). What I can offer is getting your material in front of between 300 and 900 students who enroll each week. Not all of them will start, not all of them will do the assignments, but your material should reach at least several hundreds of learners a year, of which around 40% has a masters degree, and 20% has a PhD – so you will be teaching fellow scientists (and beyond) to improve how they work.  

I will incorporate this new assignment, and make it publicly available on my blog, as soon as it is done and decided on by all people who expressed interest in creating high quality teaching material. We can evaluate the performance by looking at the accuracy rates on test items. I look forward to seeing your material, and hope this can be a small step towards an increased effort in improving statistics education. We might have a long way to go to completely eradicate p-value misconceptions, but we can start.

2 comments:

  1. Daniel-

    Thanks as always for your work. I don’t have a lesson of my own to offer, but I did have a comment on a small part of your first assignment that I think could be problematic.

    On page 2 of the posted version of lesson 1.1, you write about the first figure “There is a horizontal red dotted line that indicates an alpha of 5% (located at a frequency of 100.000*0.05 = 5000)”. But that seems like a confusing or misleading statement. First, since the line indicates a Y value, it must be a frequency of observed outcomes for p; a line showing alpha would have to indicate an X value. And even given that the line indicates the expected frequency of outcomes, it's the expectation *under the null hypothesis*, which is not explained here. More importantly, though, even if you do mean that the line will show the expected height of the bars under the null hypothesis, the only reason that you can use N*0.05 to predict that height is that you’ve divided the distribution into 20 bars - it’s not because alpha is 0.05. If you’d chosen to divide the graph into increments of 0.01 (as you do later), the height of the red line would be N*0.01 despite alpha being 0.05 (but now there would be five bars in the alpha region instead of just one). So the height of the line is based on the number of divisions, not alpha.

    Does that critique make sense? I can try to explain more fully if not.

    Cheers,
    Alistair

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete