A blog on statistics, methods, philosophy of science, and open science. Understanding 20% of statistics will improve 80% of your inferences.

Monday, September 19, 2016

Why scientific criticism sometimes needs to hurt

I think it was somewhere in the end of 2012 when my co-authors and I received an e-mail from Greg Francis pointing out that a study we published on the relationship between physical weight and importance was ‘too good to be true’. This was a stressful event. We were extremely uncertain about what this meant, but we realized it couldn’t be good. For me, it was the first article I had ever published. What did we do wrong? How serious was this allegation? What did it imply about the original effect? How would this affect our reputation?

As a researcher who gets such severe criticism, you have to go through the 5 stages of grief. Denial (‘This doesn’t make any sense at all’), anger (‘Who is this asshole?’), negotiation (‘If he would have taken into account this main effect which was non-significant, our results wouldn’t be improbable!’), depression (‘What a disaster’), until, finally, you reach acceptance (‘OK, he has somewhat of a point’).

In keeping with the times, we had indeed performed multiple comparisons without correcting, and didn’t report one study that had not revealed a significant effect (which we immediately uploaded to PsychFileDrawer).

Before Greg Francis e-mailed us, I probably had heard about statistical power, and knew about publication bias, but receiving this personal criticism forced me to kick my understanding about these issues to a new level. I started to read about the topic, and quickly understood that you can’t have exclusively significant sets of studies in scientific articles, even when there is a true effect (see Schimmack, 2012, for a good explanation). Oh, it felt unfair to be singled out, when everyone else had a file-drawer. We joked that we would from now on only submit one-study papers to avoid such criticism (the test for excessive significance can only be done on multiple study papers). And we didn’t like the tone. “Too good to be true” sounds a lot like fraud, while publication bias sounds almost as inevitable as death and taxes.

But now that some time has passed, I think about this event quite differently. I wonder where I would be without having had this criticism. I was already thinking about ‘Slow Science’ as we tended to call it in 2010, and had written about topics such as reward structures and the importance of replication research early in 2012. But if no-one had told me explicitly and directly that I was doing things wrong, would I have been equally motivated to change the way I do science? I don’t think so. There is a difference between knowing something is important, and feeling something is important. I had the opportunity to read about these topics for years, but all of a sudden, I actually was reading about these topics. Personal criticism was, at least for me, a strong motivating force.

I shouldn’t be surprised by this as a psychologist. I know there is the value-action gap (the difference between saying something is important, and acting based on those beliefs). It makes sense that it took slightly hurtful criticism for me to really be motivated to ignore current norms in my field, and take the time and effort to reflect on what I thought would be best practices.

I’m not saying that criticism has to be hurtful. Sometimes, people who criticize others can try to be a bit more nuanced when they tell the 2726th researcher who gets massive press attention based on a set of underpowered studies with all p-values between 0.03 and 0.05 that power is ‘pretty important’ and the observed results are ‘slightly unlikely’ (although I can understand they might be sometimes a bit too frustrated to use the most nuanced language possible). But I also don’t know how anyone could have brought the news that one of my most-cited papers was probably nothing more than a fluke in a way that I would not have felt stressed, angered, and depressed, as a young untenured researcher who didn’t really understand the statistical problems well enough.

This week, a large scale replication of one of the studies on the weight-importance effect was published. There was no effect. When I look at how my co-authors and myself responded, I am grateful for having received the criticism by Greg Francis years before this large scale replication was performed. Had a failure to replicate our work been the very first time I had been forced to think about the strength of our original research, I fear I might have been one of those scholars that responds defensively to failures to replicate their work. It would be likely that we would have only made it to the ‘anger’ stage in the 5 steps towards acceptance. Without having had several years to improve our understanding of the statistical issues, we would likely have written a very different commentary. Instead, we simply responded by stating: “We have had to conclude that there is actually no reliable evidence for the effect.”

I wanted to share this for two reasons.

First, I understand the defensiveness in some researchers. Getting criticism is stressful, and reduces the pleasure in your work. You don’t want to spend time having to deal with these criticisms, or feel insecure about how well you are actually able to do good science. I’ve been there, and it sucks. Based on my pop-science understanding of the literature on grief processing, I’m willing to give you a month for every year that you have been in science to go through all 5 stages. After a forty-year career, be in denial for 8 months. Be angry for another 8. But after 3 years, I expect you’ll slowly start to accept things. Maybe you want to cooperate with a registered replication report about your own work. Or maybe, if you are still active as a researcher, you want to test some of the arguments you proposed while you were in denial or negotiating, in a pre-registered study.

The second reason I wanted to share this is much more important. As a scientific community, we are extremely ungrateful to people who express criticism. I think the way we treat people who criticize us is deeply shameful. I see people who suffer blatant social exclusion. I see people who don’t get the career options they deserve. I see people whose work is kept out of prestigious journals. Those who criticize us have nothing to gain, and everything to lose. If you can judge a society by how it treats it weakest members, psychologists don’t have a lot to be proud of in this area. 

So here, I want to personally thank everyone who has taken the time to criticize my research or thoughts. I know for a fact that while it happened, I wasn’t even close to as grateful as I should have been. Even now, the eight weeks of meditation training I did two years ago will not be enough for me not to feel hurt when you criticize me. But in the long run, feel comforted that I am grateful for every criticism that forces me to have a better understanding of how to do the best science I can do.


  1. Great stuff. One of the biggest ironies of this whole circus is that psychologists seem to be behaving exactly as some of their illustrious (and less quantitatively oriented?) predecessors - up to and including Freud - might have predicted.

  2. Even now, the eight weeks of meditation training I did two years ago will not be enough for me not feel hurt when you criticize me.

    Maybe that's because it's not a replicable effect? ;)

    Seriously, I think you're completely right. Being criticized or corrected is never fun but it is sometimes necessary. I do think the tone some people use or have used for it is not always optimal though. We can't just say that it shouldn't matter. Scientists aren't unemotional robots and if you ignore that you are as much at fault for how the debate derails as the person you criticize.

    I think there are two issues here really. One is that people are way too dogmatic about their cherished results. The incentive structure of science, especially in biology and psychology, is partly to blame for that. I've met a number of people who made a career out of a particular theory or effect and who now spend a significant amount of their time defending this idea against its numerous critics. In at least one case, I have strong suspicions that there is a massive file drawer of results not supporting the idea. I'd love for some researcher to just objectively test some alternative explanations for the findings. I don't believe all the findings are untrue - I just think there is a better theory to explain them that nobody has formulated yet.

    The other issue is that such an emotional reaction is simply normal. This is something you have to learn to get over when you do science. I don't know exactly how yet, but I hope this is something we can train people to do. I for one appreciate nothing more than when my postdocs or students correct me and/or propose alternative mechanisms. I didn't say I enjoy it. It can get quite stressful but I much prefer them showing me when I'm wrong and helping me correct a mistake or incorrect conclusions, than that I want them to just sit there saying Yes Sir. And I much prefer it being them who correct me than some belligerent stranger from halfway around the world who I've never heard of.

  3. "“Too good to be true” sounds a lot like fraud, while publication bias sounds almost as inevitable as death and taxes."

    Science has a knack of giving things a fancy name so they don't appear to be that bad ("publication bias", "QRP's"). I think that's a great little trick to stop people from actually thinking about what they're doing.

    To me however, "publication bias" sounds a lot like systematic withholding of evidence, and i wonder why it isn't seen as misconduct (either by researchers or journals).

    More importantly, publication bias doesn't seem inevitable to me. A format like the "Registered Reports"-format can possibly go a long way in preventing publication bias.

    Why don't scientists demand from their journals to adopt this format? Why doesn't someone start a petition or something?

    1. Publication bias can mean a lot of things. When it comes to the within-study publication bias that Francis' seeks to reveal (let's ignore the problem of posthoc power for a moment) then I don't actually believe that the experiments in the file-drawer is the biggest problem. Instead, these p-value values are most likely skewed due to analytical flexibility - the "Garden of Forking Paths" Andrew Gelman wrote about - and it is probably pervasive in the literature. Is this a problem? Sure. Is it misconduct? No.

    2. Thanks for your reply!

      I am following this definition of publication bias:

      "Publication bias is a type of bias occurring in published academic research. It occurs when the outcome of an experiment or research study influences the decision whether to publish (or otherwise distribute)".

      I would call within-study publication bias "selective reporting". If selective reporting is also based on the outcome of the analyses, then i would also view this as withholding of evidence, and thus perhaps even misconduct.

      More importantly, if i am not mistaken, the Registered Report-format also prevents selective reporting from happening.

    3. It's misconducts if you know what it is. We have an obligation to make sure as many as possible know what it is.

    4. Thank you for the reply!

      Yes, "intentional wrongdoing" is one definition of misconduct. Another one is "mismanagement especially of governmental or military responsibilities". I view the current publication system, and as a result publication bias, in science as mismanagement of scientific responsibilities by journals and scientists.

      My whole point is that, as the original post nicely shows, certain unspoken/hidden rules of conduct may result in problematic behavior. As you point out, one solution for that is perhaps educating everyone. I am all for that, but in my opinion you can prevent a lot of problematic behavior by simply setting up an improved system that prevent them from happening: the Registered Report-format.

      I also think publication bias is not something that is inevitable like death. Hence, the thought of a (group of) scientist(s) to start a petition for the adoption of the Registered Report-format by journals. Scientists, and citizens, could sign this petition, and this could perhaps be used to introduce the Registered Report-format to journals for consideration.

  4. Wow...this is a fantastic and thoughtful post. I didn't know the back story of how you became who you are. In this case, what didn't kill you certainly made you stronger. Your ability to constructively deal with strong criticism seems to have spurred you on to become a leader in the field. And that, really is what has long been at the heart of science. Whether the criticism is ultimately found to be well-founded or not, good science happens when we take our critics seriously and then challenge ourselves to improve our methods, data, and ideas. Well done, and thanks for sharing. This post shows that the options in the 'replication crisis' are not 'destroy' or 'deny', but can and should be 'improve'. RC-J.

  5. Stronger sticks? Or should I say "more severe"?

  6. Dear Anonymous, I think I should be clearer. Personally, I also think this is what publication bias is, when non-significant or results not supporting a particular idea aren't published. This could be because the researchers don't bother or because of the significance filter inherent in scientific publishing. The latter is definitely not conduct. The former is dodgy.

    My point though was that what is often called publication bias isn't necessarily publication bias, and I'd wager it usually isn't. Rather the most likely reason is analytical or methodological flexibility and I only view this is as misconduct if it is an deliberate attempt to skew the results. As Gelman's paper outlines very nicely, these things can be unconscious and quite non-deliberate. I agree that a preregistered design can minimize this but that's really a different issue.

    1. Okay I give up. I tried very hard to post this comment in reply to Anonymous above but somehow the website won't let me. So I'll leave that here.

  7. This comment has been removed by the author.

  8. There is perhaps a way out of this conundrum. I am attempting (slowly and not yet successfully) to bring about a new scientific culture in which failed predictions are highly prized. From the perspective of one who has built theories and computational models, I know how emotionally satisfying it is to be proven right , but once you're done celebrating you learn that you may not have actually learned very much (whether you do or not depends on the rarity of the observation). However, what is always informative is a failed prediction. Assuming that the test is accurate and sufficiently powered, you always learn something new when you fail to validate a prediction. It might be a boundary condition, a maladjusted parameter, or a complete refutation of the model, but something has been learned.

    So if we can move to a world in which failed predictions are celebrated, rather than hid in the basement, perhaps we can learn to let-go of our attachment to our theories and embrace instead the idea of progress. We all know at heart that models are always wrong in the end, but it's so hard to avoid becoming attached to them.

    Of course, failed predictions are not the same as inaccurate science, but I think that our responses to the two are related. Our emotional aversion to being told that we're wrong is present in both cases, and thus learning to embrace one might help us to embrace the other.

  9. Criticism is always welcome, IF done in a politely, useful, and friendly fashion. "Too good to be true" seems quite neutral in isolation (although the context around might have given the intonation). I do not think researchers are bitter from critics, but from mean comments, like "this manuscript is so bad I wanna puke", which sadly receives the label as a 'critic'. In this case, many editors haven't done their proper job I must say. They should be the ones controlling for such deviations in criticism. Anonymous review process should not be an excuse for liberating your evil side...

  10. There is a fine line between positive and negative criticism. The former can get you "points" (e.g., a gaming system like Publons) while the latter can get you banned. Trust me, I've employed both and suffered the consequences. Science is in a terrible state of turmoil, and criticisms is only just a natural expression of the pressures that scientists are facing. Criticisms of the system, criticisms of competitors, criticisms of journals, editors, papers and publishers. In a way, maybe this rough patch is a necessary evil to pull science out of the mess it finds itself in. Chin up, Daniel!

  11. This comment has been removed by a blog administrator.