A blog on statistics, methods, philosophy of science, and open science. Understanding 20% of statistics will improve 80% of your inferences.

Saturday, October 12, 2019

Improving Your Statistical Questions

Three years after launching my first massive open online course (MOOC) ‘Improving Your Statistical Inferences’ on Coursera, today I am happy to announce a second completely free online course called ‘Improving Your Statistical Questions’. My first course is a collection of lessons about statistics and methods that we commonly use, but that I wish I had known how to use better when I was taking my first steps into empirical research. My new course is a collection of lessons about statistics and methods that we do not yet commonly use, but that I wish we start using to improve the questions we ask. Where the first course tries to get people up to speed about commonly accepted best practices, my new course tries to educate researchers about better practices. Most of the modules consist of topics in which there has been more recent developments, or at least increasing awareness, over the last 5 years.

About a year ago, I wrote on this blog: If I ever make a follow up to my current MOOC, I will call it ‘Improving Your Statistical Questions’. The more I learn about how people use statistics, the more I believe the main problem is not how people interpret the numbers they get from statistical tests. The real issue is which statistical questions researchers ask from their data. If you approach a statistician to get help with the data analysis, most of their time will be spend asking you ‘but what is your question?’. I hope this course helps to take a step back, reflect on this question, and get some practical advice on how to answer it.

There are 5 modules, with 15 videos, and 13 assignments that provide hands on explanations of how to use the insights from the lectures in your own research. The first week discusses different questions you might want to ask. Only one of these is a hypothesis test, and I examine in detail if you really want to test a hypothesis, or are simply going through the motions of the statistical ritual. I also discuss why NHST is often not a very risky prediction, and why range predictions are a more exciting question to ask (if you can). Module 2 focuses on falsification in practice and theory, including a lecture and some assignments on how to determine the smallest effect size of interest in the studies you perform. I also share my favorite colloquium question for whenever you dozed of and wake up at the end only to find no one else is asking a question, when you can always raise you hand to ask ‘so, what would falsify your hypothesis?’ Module 3 discusses the importance of justifying error rates, a more detailed discussion on power analysis (following up on the ‘sample size justification’ lecture in MOOC1), and a lecture on the many uses of learning how to simulate data. Module 4 moves beyond single studies, and asks what you can expect from lines of research, how to perform a meta-analysis, and why the scientific literature does not look like reality (and how you can detect, and prevent contributing to, a biased literature). I was tempted to add this to MOOC1, but I am happy I didn’t, as there has been a lot of exciting work on bias detection that is now part of the lecture. The last module has three different topics I think are important: computational reproducibility, philosophy of science (this video would also have been a good first video lecture, but I don’t want to scare people away!) and maybe my favorite lecture in the MOOC on scientific integrity in practice. All are accompanied by assignments, and the assignments is where the real learning happens.

If after this course some people feel more comfortable to abandon hypothesis testing and just describe their data, make their predictions a bit more falsifiable, design more informative studies, publish sets of studies that look a bit more like reality, and make their work more computationally reproducible, I’ll be very happy.

The content of this MOOC is based on over 40 workshops and talks I gave in the last 3 years since my previous MOOC came out, testing this material on live crowds. It comes with some of the pressure a recording artist might feel for a second record when their first was somewhat successful. As my first MOOC hits 30k enrolled learners (many of who attend very few of the content, but still with thousands of people taking in a lot of the material) I hope it comes close and lives up to expectations.

I’m very grateful to Chelsea Parlett Pelleriti who checked all assignments for statistical errors or incorrect statements, and provided feedback that made every exercise in this MOOC better. If you need a statistics editor, you can find her at: https://cmparlettpelleriti.github.io/TheChatistician.html. Special thanks to Tim de Jonge who populated the Coursera environment as a student assistant, and Sascha Prudon for recording and editing the videos. Thanks to Uri Simonsohn for feedback on Assignment 2.1, Lars Penke for suggesting the SESOI example in lecture 2.2, Lisa DeBruine for co-developing Assignment 2.4, Joe Hilgard for the PET-PEESE code in assignment 4.3, Matti Heino for the GRIM test example in lecture 4.3, and Michelle Nuijten for feedback on assignment 4.4. Thanks to Seth Green, Russ Zack and Xu Fei at Code Ocean for help in using their platform to make it possible to run the R code online. I am extremely grateful for all alpha testers who provided feedback on early versions of the assignments: Daniel Dunleavy, Robert Gorsch, Emma Henderson, Martine Jansen, Niklas Johannes, Kristin Jankowsky, Cian McGinley, Robert Görsch, Chris Noone, Alex Riina, Burak Tunca, Laura Vowels, and Lara Warmelink, as well as the beta-testers who gave feedback on the material on Coursera: Johannes Breuer, Marie Delacre, Fabienne Ennigkeit, Marton L. Gy, and Sebastian Skejø. Finally, thanks to my wife for buying me six new shirts because ‘your audience has expectations’ (and for accepting how I worked through the summer holiday to complete this MOOC).

All material in the MOOC is shared with a CC-BY-NC-SA license, and you can access all material in the MOOC for free (and use it in your own education). Improving Your Statistical Questions is available from today. I hope you enjoy it!


  1. This comment has been removed by a blog administrator.

  2. This comment has been removed by a blog administrator.

  3. This comment has been removed by a blog administrator.

  4. This comment has been removed by a blog administrator.

  5. This comment has been removed by a blog administrator.