A blog on statistics, methods, philosophy of science, and open science. Understanding 20% of statistics will improve 80% of your inferences.

Saturday, November 11, 2017

The Statisticians' Fallacy

If I ever make a follow up to my current MOOC, I will call it ‘Improving Your Statistical Questions’. The more I learn about how people use statistics, the more I believe the main problem is not how people interpret the numbers they get from statistical tests. The real issue is which statistical questions researchers ask from their data.

Our statistics education turns a blind eye to training people how to ask a good question. After a brief explanation of what a mean is, and a pit-stop at the normal distribution, we jump through as many tests as we can fit in the number of weeks we are teaching. We are training students to perform tests, but not to ask questions.

There are many reasons for this lack of attention in training people how to ask a good question. But here I want to focus on one reason, which I’ve dubbed the Statisticians' Fallacy: Statisticians who tell you ‘what you really want to know’, instead of explaining how to ask one specific kind of question from your data.

Let me provide some example of the Statisticians' Fallacy. In the next quotes, pay attention to the use of the word ‘want’. Cohen (1994) in his ‘The earth is round (p < .05)’ writes:

Colquhoun (2017) writes:

Or we can look at Cumming (2013):

Or Bayarri, Benjamin, Berger, and Sellke (2016):

Now, you might have noticed that these four statements by statisticians of ‘what we want’ are all different. The one says 'we want' to know the posterior probability that our hypothesis is true, the others says 'we want' to know the false positive report probability, yet another says 'we want' effect sizes and their confidence intervals, and yet another says 'we want' the strength of evidence in the data.

Now you might want to know all these things, you might want to know some of these things, and you might want to know yet other things. I have no clue what you want to know (and after teaching thousands of researchers the last 5 years, I’m pretty sure often you don't really have a clue what you want either - you've never been trained to thoroughly ask this question). But what I think I know is that statisticians don’t know what you want to know. They might think some questions are interesting enough to ask. They might argue that certain questions follow logically from a specific philosophy of science. But the idea that there is always a single thing ‘we want’ is not true. If it was, statisticians would not have been criticizing what other statisticians say ‘we want’ for the last 50 years. Telling people 'what you want to know' instead of teaching people to ask themselves what they want to know will just get us another two decades of mindless statistics.

I am not writing this to stop statisticians from criticizing each other (I like to focus on easier goals in my life, such as world peace). But after reading many statements like the ones I’ve cited above, I have distilled my main take-home message in a bathroom tile:

There are many, often complementary, questions you can ask from your data, or when performing lines of research. Now I am not going to tell you what you want. But what I want, is that we stop teaching researchers there is only a single thing they want to know. There is no room for the Statistician’s Fallacy in our education. I do not think it is useful to tell researchers what they want to know. But I think it’s a good idea to teach them about all the possible questions they can ask.

Further Reading:
Thanks to Carol Nickerson who, after reading this blog, pointed me to David Hand's Deconstructing Statistical Questions, which is an excellent article on the same topic - highly recommended.


  1. As a Stats teacher, I don't tell my students what they want to know. I do, however, tell them what kind of questions they can (and can't) get answered whenever they apply a particular technique or model. As a consultant, I typically hear the person first (to know "what they want") and then I try to advice accordingly. It does happen that then I rephrase what one can achieve when using a specific method (kind of what you can conclude once you replace a parametric by a nonparametric ANOVA, or even better, once you ditch frequestism over bayesianism). I think it is a best advice to educate students to phrase their questions in such a way that a translation into a suitable statistical model becomes feasible (instead of just freely phrasing questions).

  2. This is your best post so far Daniel. I look forward to the one directly following up your last sentence.

  3. Bravo! Occasionally my brighter students manage to "come up for air" after being submerged in probability distributions, hypothesis tests, and modeling techniques. Then they ask "What's it all for?" and "How do I know when to use what?" and other embarrassing questions which most textbooks carefully avoid. Now I need to collect case studies that prompt my students to think about those "possible questions to ask."

  4. >> There is no room for the Statistician’s Fallacy in our education. I do not think it is useful to tell researchers what they want to know.

    I haven't heard of anyone doing this in his/her lectures.

    >> the idea that there is always a single thing ‘we want’ is not true

    Nor do I see this idea being presented in any of the quotes. In my reading, the quoted statements are descriptive - they tell us what researchers want to do based on what we saw them doing in the past. There are numerous studies that show that researchers interpret p value as probability of the hypothesis (eg Oakes,1986) and Cumming has done some research questioning students about their use and interpretation of p values and effect size.

    In addition, in some qoutes, the "we" might refer to the author of the manuscript and the statements about "what we want" refer to goals that were set earlier in the writing. I can certainly imagine this being the case for the qoute from Cumming. We should consider the context.

    >> But I think it’s a good idea to teach them about all the possible questions they can ask.

    Of course, this is a nice idea and hardly anyone would disagree. But how do we find all those possible questions and how and what should we teach students about them? My guess is that in the end the popular answer would be to provide introduction into (history of) philosophy of science and perhaps some (history of) epistemology, sprinkled with some insights from Meehl, Cohen and other gurus. I doubt this improves inferences since it does not provide a formal, transparent and replicable (and thus scientific) language for asking and discussing scientific questions and statements.

    As You possibly already learned from my other comments, I think such language already exists and should be taught. It's the language of modern causal inference.

  5. Nice post. I tried to make a similar argument by pointing out that many statistical analyses depend on exactly the same information from the data but can produce different answers because they ask different questions. Details at: