I think it was somewhere in the end of 2012 when my
co-authors and I received an e-mail from Greg Francis pointing out that a study we published on the relationship between physical weight and importance was
‘too good to be true’. This was a stressful event. We were extremely uncertain
about what this meant, but we realized it couldn’t be good. For me, it was the
first article I had ever published. What did we do wrong? How serious was this
allegation? What did it imply about the original effect? How would this affect
our reputation?
As a researcher who gets such severe criticism, you have to
go through the 5 stages of grief. Denial (‘This doesn’t make any sense at
all’), anger (‘Who is this asshole?’), negotiation (‘If he would have taken
into account this main effect which was non-significant, our results wouldn’t
be improbable!’), depression (‘What a disaster’), until, finally, you reach
acceptance (‘OK, he has somewhat of a point’).
In keeping with the times, we had indeed performed multiple
comparisons without correcting, and didn’t report one study that had not
revealed a significant effect (which we immediately uploaded to
PsychFileDrawer).
Before Greg Francis e-mailed us, I probably had heard about statistical
power, and knew about publication bias, but receiving this personal criticism
forced me to kick my understanding about these issues to a new level. I started
to read about the topic, and quickly understood that you can’t have exclusively
significant sets of studies in scientific articles, even when there is a true
effect (see Schimmack, 2012, for a good explanation). Oh, it felt unfair to be
singled out, when everyone else had a file-drawer. We joked that we would from
now on only submit one-study papers to avoid such criticism (the test for
excessive significance can only be done on multiple study papers). And we
didn’t like the tone. “Too good to be true” sounds a lot like fraud, while
publication bias sounds almost as inevitable as death and taxes.
But now that some time has passed, I think about this event
quite differently. I wonder where I would be without having had this criticism.
I was already thinking about ‘Slow Science’ as we tended to call it in 2010,
and had written about topics such as reward structures and the importance of replication
research early in 2012. But if no-one had told me explicitly and directly that
I was doing things wrong, would I have been equally motivated to change the way
I do science? I don’t think so. There is a difference between knowing something is important, and feeling something is important. I had
the opportunity to read about these topics for years, but all of a sudden, I
actually was reading about these
topics. Personal criticism was, at least for me, a strong motivating force.
I shouldn’t be surprised by this as a psychologist. I know
there is the value-action
gap (the difference between saying something is important, and acting based
on those beliefs). It makes sense that it took slightly hurtful criticism for
me to really be motivated to ignore current norms in my field, and take the
time and effort to reflect on what I thought would be best practices.
I’m not saying that criticism has to be hurtful. Sometimes,
people who criticize others can try to be a bit more nuanced when they tell the
2726th researcher who gets massive press attention based on a set of
underpowered studies with all p-values
between 0.03 and 0.05 that power is ‘pretty important’ and the observed results
are ‘slightly unlikely’ (although I can understand they might be sometimes a
bit too frustrated to use the most nuanced language possible). But I also don’t
know how anyone could have brought the news that one of my most-cited papers
was probably nothing more than a fluke in a way that I would not have felt stressed,
angered, and depressed, as a young untenured researcher who didn’t really
understand the statistical problems well enough.
This week, a large scale replication of one of the studies
on the weight-importance effect was published. There was no effect. When I look
at how my co-authors and myself responded, I am grateful for having received
the criticism by Greg Francis years before this large scale replication was
performed. Had a failure to replicate our work been the very first time I had
been forced to think about the strength of our original research, I fear I
might have been one of those scholars that responds defensively to failures to
replicate their work. It would be likely that we would have only made it to the
‘anger’ stage in the 5 steps towards acceptance. Without having had several years
to improve our understanding of the statistical issues, we would likely have
written a very different commentary. Instead, we simply responded by stating:
“We have had to conclude that there is actually no reliable evidence for the
effect.”
I wanted to share this for two reasons.
First, I understand the defensiveness in some researchers. Getting
criticism is stressful, and reduces the pleasure in your work. You don’t want
to spend time having to deal with these criticisms, or feel insecure about how
well you are actually able to do good science. I’ve been there, and it sucks.
Based on my pop-science understanding of the literature on grief processing,
I’m willing to give you a month for every year that you have been in science to
go through all 5 stages. After a forty-year career, be in denial for 8 months.
Be angry for another 8. But after 3 years, I expect you’ll slowly start to
accept things. Maybe you want to cooperate with a registered replication report
about your own work. Or maybe, if you are still active as a researcher, you
want to test some of the arguments you proposed while you were in denial or negotiating,
in a pre-registered study.
The second reason I wanted to share this is much more
important. As a scientific community, we are extremely ungrateful to people who
express criticism. I think the way we treat people who criticize us is deeply
shameful. I see people who suffer blatant social exclusion. I see people who
don’t get the career options they deserve. I see people whose work is kept out
of prestigious journals. Those who criticize us have nothing to gain, and
everything to lose. If you can judge a society by how it treats it weakest
members, psychologists don’t have a lot to be proud of in this area.
So here, I want to personally thank everyone who has taken
the time to criticize my research or thoughts. I know for a fact that while it
happened, I wasn’t even close to as grateful as I should have been. Even now, the
eight weeks of meditation training I did two years ago will not be enough for
me not to feel hurt when you criticize me. But in the long run, feel comforted
that I am grateful for every criticism that forces me to have a better
understanding of how to do the best science I can do.
Great stuff. One of the biggest ironies of this whole circus is that psychologists seem to be behaving exactly as some of their illustrious (and less quantitatively oriented?) predecessors - up to and including Freud - might have predicted.
ReplyDeleteEven now, the eight weeks of meditation training I did two years ago will not be enough for me not feel hurt when you criticize me.
ReplyDeleteMaybe that's because it's not a replicable effect? ;)
Seriously, I think you're completely right. Being criticized or corrected is never fun but it is sometimes necessary. I do think the tone some people use or have used for it is not always optimal though. We can't just say that it shouldn't matter. Scientists aren't unemotional robots and if you ignore that you are as much at fault for how the debate derails as the person you criticize.
I think there are two issues here really. One is that people are way too dogmatic about their cherished results. The incentive structure of science, especially in biology and psychology, is partly to blame for that. I've met a number of people who made a career out of a particular theory or effect and who now spend a significant amount of their time defending this idea against its numerous critics. In at least one case, I have strong suspicions that there is a massive file drawer of results not supporting the idea. I'd love for some researcher to just objectively test some alternative explanations for the findings. I don't believe all the findings are untrue - I just think there is a better theory to explain them that nobody has formulated yet.
The other issue is that such an emotional reaction is simply normal. This is something you have to learn to get over when you do science. I don't know exactly how yet, but I hope this is something we can train people to do. I for one appreciate nothing more than when my postdocs or students correct me and/or propose alternative mechanisms. I didn't say I enjoy it. It can get quite stressful but I much prefer them showing me when I'm wrong and helping me correct a mistake or incorrect conclusions, than that I want them to just sit there saying Yes Sir. And I much prefer it being them who correct me than some belligerent stranger from halfway around the world who I've never heard of.
"“Too good to be true” sounds a lot like fraud, while publication bias sounds almost as inevitable as death and taxes."
ReplyDeleteScience has a knack of giving things a fancy name so they don't appear to be that bad ("publication bias", "QRP's"). I think that's a great little trick to stop people from actually thinking about what they're doing.
To me however, "publication bias" sounds a lot like systematic withholding of evidence, and i wonder why it isn't seen as misconduct (either by researchers or journals).
More importantly, publication bias doesn't seem inevitable to me. A format like the "Registered Reports"-format can possibly go a long way in preventing publication bias.
Why don't scientists demand from their journals to adopt this format? Why doesn't someone start a petition or something?
Publication bias can mean a lot of things. When it comes to the within-study publication bias that Francis' seeks to reveal (let's ignore the problem of posthoc power for a moment) then I don't actually believe that the experiments in the file-drawer is the biggest problem. Instead, these p-value values are most likely skewed due to analytical flexibility - the "Garden of Forking Paths" Andrew Gelman wrote about - and it is probably pervasive in the literature. Is this a problem? Sure. Is it misconduct? No.
DeleteThanks for your reply!
DeleteI am following this definition of publication bias:
"Publication bias is a type of bias occurring in published academic research. It occurs when the outcome of an experiment or research study influences the decision whether to publish (or otherwise distribute)".
I would call within-study publication bias "selective reporting". If selective reporting is also based on the outcome of the analyses, then i would also view this as withholding of evidence, and thus perhaps even misconduct.
More importantly, if i am not mistaken, the Registered Report-format also prevents selective reporting from happening.
It's misconducts if you know what it is. We have an obligation to make sure as many as possible know what it is.
DeleteThank you for the reply!
DeleteYes, "intentional wrongdoing" is one definition of misconduct. Another one is "mismanagement especially of governmental or military responsibilities". I view the current publication system, and as a result publication bias, in science as mismanagement of scientific responsibilities by journals and scientists.
My whole point is that, as the original post nicely shows, certain unspoken/hidden rules of conduct may result in problematic behavior. As you point out, one solution for that is perhaps educating everyone. I am all for that, but in my opinion you can prevent a lot of problematic behavior by simply setting up an improved system that prevent them from happening: the Registered Report-format.
I also think publication bias is not something that is inevitable like death. Hence, the thought of a (group of) scientist(s) to start a petition for the adoption of the Registered Report-format by journals. Scientists, and citizens, could sign this petition, and this could perhaps be used to introduce the Registered Report-format to journals for consideration.
Wow...this is a fantastic and thoughtful post. I didn't know the back story of how you became who you are. In this case, what didn't kill you certainly made you stronger. Your ability to constructively deal with strong criticism seems to have spurred you on to become a leader in the field. And that, really is what has long been at the heart of science. Whether the criticism is ultimately found to be well-founded or not, good science happens when we take our critics seriously and then challenge ourselves to improve our methods, data, and ideas. Well done, and thanks for sharing. This post shows that the options in the 'replication crisis' are not 'destroy' or 'deny', but can and should be 'improve'. RC-J.
ReplyDeleteStronger sticks? Or should I say "more severe"?
ReplyDeletehttps://errorstatistics.com/2016/09/18/the-myth-of-the-myth-of-objectivity-i/
Dear Anonymous, I think I should be clearer. Personally, I also think this is what publication bias is, when non-significant or results not supporting a particular idea aren't published. This could be because the researchers don't bother or because of the significance filter inherent in scientific publishing. The latter is definitely not conduct. The former is dodgy.
ReplyDeleteMy point though was that what is often called publication bias isn't necessarily publication bias, and I'd wager it usually isn't. Rather the most likely reason is analytical or methodological flexibility and I only view this is as misconduct if it is an deliberate attempt to skew the results. As Gelman's paper outlines very nicely, these things can be unconscious and quite non-deliberate. I agree that a preregistered design can minimize this but that's really a different issue.
Okay I give up. I tried very hard to post this comment in reply to Anonymous above but somehow the website won't let me. So I'll leave that here.
DeleteThis comment has been removed by the author.
ReplyDeleteThere is perhaps a way out of this conundrum. I am attempting (slowly and not yet successfully) to bring about a new scientific culture in which failed predictions are highly prized. From the perspective of one who has built theories and computational models, I know how emotionally satisfying it is to be proven right , but once you're done celebrating you learn that you may not have actually learned very much (whether you do or not depends on the rarity of the observation). However, what is always informative is a failed prediction. Assuming that the test is accurate and sufficiently powered, you always learn something new when you fail to validate a prediction. It might be a boundary condition, a maladjusted parameter, or a complete refutation of the model, but something has been learned.
ReplyDeleteSo if we can move to a world in which failed predictions are celebrated, rather than hid in the basement, perhaps we can learn to let-go of our attachment to our theories and embrace instead the idea of progress. We all know at heart that models are always wrong in the end, but it's so hard to avoid becoming attached to them.
Of course, failed predictions are not the same as inaccurate science, but I think that our responses to the two are related. Our emotional aversion to being told that we're wrong is present in both cases, and thus learning to embrace one might help us to embrace the other.
Criticism is always welcome, IF done in a politely, useful, and friendly fashion. "Too good to be true" seems quite neutral in isolation (although the context around might have given the intonation). I do not think researchers are bitter from critics, but from mean comments, like "this manuscript is so bad I wanna puke", which sadly receives the label as a 'critic'. In this case, many editors haven't done their proper job I must say. They should be the ones controlling for such deviations in criticism. Anonymous review process should not be an excuse for liberating your evil side...
ReplyDeleteThere is a fine line between positive and negative criticism. The former can get you "points" (e.g., a gaming system like Publons) while the latter can get you banned. Trust me, I've employed both and suffered the consequences. Science is in a terrible state of turmoil, and criticisms is only just a natural expression of the pressures that scientists are facing. Criticisms of the system, criticisms of competitors, criticisms of journals, editors, papers and publishers. In a way, maybe this rough patch is a necessary evil to pull science out of the mess it finds itself in. Chin up, Daniel!
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDelete