A recent
paper in AMPPS points out that many textbooks for introduction to
psychology courses incorrectly explain p-values. There are dozens, if
not hundreds, of papers that point out problems in how people understand p-values.
If we don’t do anything about it, there will be dozens of articles like this in
the next decades as well. So let’s do something about it.
When I made my first MOOC three years ago I
spent some time thinking about how to explain what a p-value is clearly
(you can see my video here). Some
years later I realized that if you want to prevent misunderstandings of p-values,
you should also explicitly train people about what p-values are not. Now,
I think that training away misconceptions is just as important as explaining the
correct interpretation of a p-value. Based on a blog
post I made a new assignment for my MOOC. In the last year Arianne
Herrera-Bennett (@ariannechb) performed
an A/B test in my MOOC ‘Improving Your Statistical Inferences’. Half of the
learners received this new assignment, explicitly aimed at training away
misconceptions. The results are in her PhD thesis that she will defend on the
27th of September, 2019, but one of the main conclusions in the
study is that it is possible to substantially reduce common misconceptions
about p-values by educating people about them. This is a hopeful message.
I tried to keep the
assignment as short as possible, and therefore it is 20 pages. Let that
sink in for a moment. How much space does education about p-values take
up in your study material? How much space would you need to prevent misunderstandings?
And how often would you need to repeat the same material across the years? If
we honestly believe misunderstanding of p-values are a problem, then why
don’t we educate people well enough to prevent misunderstandings? The fact that
people do not understand p-values is not their mistake – it is ours.
In my own MOOC I needed 7 pages to explain
what p-value distributions look like, how they are a function of power,
why p-values are uniformly distributed when the null is true, and what Lindley’s
paradox is. But when I tried to clearly explain common misconceptions, I needed
a lot more words. Before you want to blame that poor p-value, let me
tell you that I strongly believe the problem of misconceptions is not limited
to p-values: Probability is just not intuitive. It might always take
more time to explain ways you can misunderstand something, than to teach the
correct way to understand something.
In a recent pre-print I wrote on p-values,
I reflect on the bad job we have been doing at teaching others about p-values.
I write:
If anyone seriously believes the
misunderstanding of p-values lies at the heart of reproducibility issues in
science, why are we not investing more effort to make sure misunderstandings of
p-values are resolved before young scholars perform their first research
project? Although I am sympathetic to statisticians who think all the
information researchers need to educate themselves on this topic is already
available, as an experimental psychologist who works at a Human-Technology
Interaction department this reminds me too much of the engineer who argues all
the information to understand the copy machine is available in the user manual.
In essence, the problems we have with how p-values are used is a human factors
problem (Tryon, 2001). The challenge is to get researchers to improve the way
they work.
Looking at the deluge of papers
published in the last half century that point out how researchers have consistently
misunderstood p-values, I am left to wonder: Where is the innovative
coordinated effort to create world class educational materials that can freely
be used in statistical training to prevent such misunderstandings? It is
nowadays relatively straightforward to create online apps where people can
simulate studies and see the behavior of p-values across studies, which can
easily be combined with exercises that fit the knowledge level of bachelor and
master students. The second point I want to make in this article is that a
dedicated attempt to develop evidence based educational material in a
cross-disciplinary team of statisticians, educational scientists, cognitive
psychologists, and designers seems worth the effort if we really believe young
scholars should understand p-values. I do not think that the effort
statisticians have made to complain about p-values is matched with a similar
effort to improve the way researchers use p-values and hypothesis tests. We
really have not tried hard enough.
So how about we get serious about solving
this problem? Let’s get together and make a dent in this decade old problem. Let’s
try hard enough.
A good place to start might be to take stock
of good ways to educate people about p-values that already exist, and then all
together see how we can improve them.
I have uploaded my lecture about p-values to YouTube, and
my assignment to train away misconceptions is available
online as a Google Doc (the answers and feedback is here).
This is just my current approach to
teaching p-values. I am sure there are many other approaches (and it
might turn out that watching several videos, each explaining p-values in
slightly different ways, is an even better way to educate people than having
only one video). If anyone wants to improve this material (or replace it by better material) I am willing to open
up my online MOOC for anyone who wants to do an A/B test of any good idea, so you can collect data from hundreds
of students each year. I’m more than happy to collect best practices in p-value
education – if you have anything you think (or have empirically shown) works
well, send it my way - and make it openly available.
Educators, pedagogists, statisticians, cognitive psychologists, software
engineers, and designers interested in improving educational materials should
find a place to come together. I know there are organizations that exist to
improve statistics education (but have no good information about what they do, or which one would be best to join given my goals), and if you work for such an organization and are interested in taking p-value
education to the next level, I’m more than happy to spread this message in my
network and work with you.
If we really consider the misinterpretation
of p-values to be one of the more
serious problems underlying the lack of replicability of scientific findings,
we need to seriously reflect on whether we have done enough to prevent
misunderstandings. Treating it as a human factors problem might illuminate ways
in which statistics education and statistical software can be improved. Let’s beat swords into ploughshares, and turn papers complaining about how people
misunderstand p-values into papers that examine how we can improve education
about p-values.
Great [post - thank you!
ReplyDeleteHere’s the thing: the problem really isn’t how to explain p-values better. The problem is that people generally don’t know a) what the aim of science is and b) why we would want to use p-values in furtherance of that aim.
ReplyDeleteLong story short: There can be no such thing as certain (or even probable) knowledge. Knowledge can be objective, but it will always remain relative to fundamental assumptions. That implies that we can only achieve successively better knowledge. For that, we can employ valid, deductive logic, which enables us to make choices (www.theopensociety.net/2011/08/the-power-of-logic) that can in turn be informed by (a distribution of!) p-values.
Great, thank you very much!
ReplyDeleteI think we should also improve education about ICs.
¿Do you know this paper?
http://learnbayes.org/papers/confidenceIntervalsFallacy/fundamentalError_PBR.pdf