This blog post is based on a pre-print by Coles, Tiokhin, Scheel,
Isager, and Lakens “The Costs and Benefits of Replications”, submitted to Behavioral
Brain Sciences as a commentary on “Making Replication Mainstream”.
In a summary of
recent discussions about the role of direct replications in psychological
science, Zwaan, Etz, Lucas, and Donnellan (2017) argue that replications should
be more mainstream. The debate about
the importance of replication research is essentially driven by disagreements
about the value of replication studies,
in a world where we need to carefully think about the best way to allocate
limited resources when pursuing
scientific knowledge. The real
question, we believe, is when replication studies are worthwhile to perform.
Goldin-Meadow
stated that "it’s just too costly or unwieldy to generate hypotheses on
one sample and test them on another when, for example, we’re conducting a large
field study or testing hard-to-find participants" (2016). A similar comment is made by Tackett
and McShane (2018) in their comment on ZELD: “Specifically, large-scale replications are typically only possible
when data collection is fast and not particularly costly, and thus they are,
practically speaking, constrained to certain domains of psychology (e.g.,
cognitive and social).”
Such statements imply a cost-benefit
analysis. But these scholars do not quantify their costs and benefits. They
hide their subjective expected utility (what is a large-scale replication study
worth to me) behind absolute
statements, as they write “is” and “are” but really mean “it is my subjective
belief that”. Their statements are empty, scientifically speaking, because they
are not quantifiable. What is “costly”? We can not have a discussion about such
an important topic if researchers do not specify their assumptions in
quantifiable terms.
Some studies may be deemed valuable enough
to justify even quite substantial investments to guarantee that a replication
study is performed. For instance, because it is unlikely that anyone will build
a Large Hadron Collider to replicate the studies at CERN, there are two
detectors (ATLAS and CMS) so that independent teams can replicate each other’s
work. That is, not only do these researchers consider it important to have a
very low (5 sigma) alpha level when they analyze data, they also believe it is
worthwhile to let two team independently do the same thing. As a physicist
remarks: “Replication is, in the end, the most important part of error control.
Scientists are human, they make mistakes, they are deluded, and they cheat. It
is only through attempted replication that errors, delusions, and outright fraud
can be caught.” Thus, high cost is not by itself a conclusive argument against
replication. Instead, one must make the case that the benefits do not justify
the costs. Again, I ask: what is “costly”?
Decision theory is a formal framework that
allows researchers to decide when replication
studies are worthwhile. It requires researchers to specify their assumptions in
quantifiable terms. For example, the expected utility of a direct replication (compared to a conceptual
replication) depends on the probability that a specific theory or effect is
true. If you believe that many published findings are false, then directly
replicating prior work may be a cost-efficient way to prevent researchers from
building on unreliable findings. If you believe that psychological theories
usually make accurate predictions, then conceptual extensions may lead to more
efficient knowledge gains than direct replications. Instead of wasting time
arguing about whether direct replications are important or whether conceptual
replications are important, do the freaking math. Tell us at which probability
that H0 is true you think it is efficient enough to weed out false positives
from the literature through direct replications. Show us, by pre-registering
all your main analyses, that you are building on strong theories that allow you
to make correct predictions with a 92% success rate, and that you therefore do
not feel direct replications are the more efficient way to gain knowledge in
your area.
I am happy to see our ideas about the
importance of using decision theory to determine when replications are important enough to perform were
independently replicated in this commentary on ZELD by Hardwicke, Tessler, Peloquin, and Frank. We
have collaboratively been working on a manuscript to specify the Replication
Value of replication studies for several years, and with the recent funding I
received, I’m happy that we can finally dedicate the time to complete this
work. I look forward to scientists explicitly thinking about the utility of the
research they perform. This is an important question, and I can’t wait for our
field to start discussing ways to answer how we can quantify the utility of the
research we perform. This will not be easy. But unless you never think about
how to spend your resources, you are making these choices implicitly all the
time, and this question is too important to give up without even trying. In our
pre-print, we illustrate how all
concerns raised against replication studies basically boil down to a discussion
about their costs and benefits, and how formalizing these costs and benefits
would improve the way researchers discuss this topic.
Perhaps it would be easier for people to get into justifying replications in this way if they were also in the habit of similarly justifying why they run their initial studies. I'm not convinced that this is always the case. "Because we have a grad student who thinks this is interesting and a participant pool who have quotas to meet" does not necessarily meet this criterion, I would suggest.
ReplyDelete"Tell us at which probability that H0 is true you think it is efficient enough to weed out false positives from the literature through direct replications. Show us, by pre-registering all your main analyses, that you are building on strong theories that allow you to make correct predictions with a 92% success rate, and that you therefore do not feel direct replications are the more efficient way to gain knowledge in your area."
ReplyDeleteHmm, not sure if i agree with (my interpretation of) this. I reason:
1) it seems to me that it is impossible to determine/gauge the percentage of hypotheses that (will) turn out to be "correct"/"proven".
2) more importantly, it seems to me that a) it doesn't matter what this percentage is (within reasonable boundaries), and b) it is not even desirable to try and determine/gauge this percentage, because i reason both a) and b) are irrelevant for building knowledge.
What matters in my reasoning is amassing things like optimally gathered (and thus maximaly informational) data, and arguments/reasoning, which can both be used for things like theory-building and -testing.
I am all for "(...) scientists explicitly thinking about the utility of the research they perform", but i reason it might make more sense to think about this for all research, not just replications. In fact, i reason that it is more important for "original" research, because i predict that (nearly) all else will follow automatically once things are done more optimally from the start.
The bottom line to me is, many of the things that might be wrong in psychological science seem to me to be connected, and based on a few things that need to be improved. I reason the rest of the problems will solve themselves automatically.
Here is an idea which tries to solve some of the basic things that might be connected, and which i reason will also help solve other issues. I called it after what i think is a summarization of the few basic problems which i reason can all easily be solved: “Science is dependent on scientists (old flawed model?) V scientistst are dependent on Science (new improved model?)”:
http://andrewgelman.com/2017/12/17/stranger-than-fiction/#comment-628652
I hope you will (also) focus on how to optimally perform research in general, and not just on replications. I reason when the former is done, the rest will follow automatically. Good luck/ all the best with your work on this important topic !!
The (gist of) this reasoning in slightly different words using a different blogpost on this topic: https://pedermisager.netlify.com/post/what-to-replicate/
DeleteIn the blogpost by Isager various reasons are given why researchers could have decided to replicate certain findings. I was wondering if you have thought about the possibility of *not* replicating, and/or giving attention to, any past work.
If we want to take into account your assumptions regarding resource constraints and the willingness to want to replicate, it might be way more fruitful (and perhaps ethical and responsible) for researchers to not replicate any past work but concentrate on replicating current/future work.
I reason all the different reasons researchers give to replicate past work might all be considered to be equivalent from the perspective of a cumulative science. I reason this is because all the different reasons Isager provides are, could be, or will be intertwined and influenced by eachother. I reason from the perspective of viewing psychological science as a cumulative science, it therfore possibly doesn't matter 1) what the reason is for replicating among your examples, 2) it could even be the reason *not* to replicate, and 3) the "starting point" in a research program (e.g. a direct replication of past work) is perhaps way less important than the entire processs of that research program.
For instance, assuming the narrative of the past few years is (partly) correct that "sexy" (but probably based on low-quality studies) findings have been rewarded, it could be reasoned that these "sexy" findings will have had theoretical impact, gathered personal interest, influenced policy, and ammassed many citations. If this makes any sense, all the reasons researchers give for replicating past work in your blogpost, may in fact be the exact reasons why they *shouldn't* want to replicate them given resource constraints and wanting to replicate things. All this replication of past work might be giving attention to sub-optimal work, and researchers, for a 2nd time ?! Also possibly see "Replication initiatives will not salvage the trustworthiness of psychology" by J. C. Coyne (https://bmcpsychology.biomedcentral.com/articles/10.1186/s40359-016-0134-3)
Here is a link to a research (and publication) format that incorporates direct replications of "new" work, and that involves a more continuous and cumulative manner of replicating and doing research:
http://andrewgelman.com/2017/12/17/stranger-than-fiction/#comment-628652
"Goldin-Meadow stated that "it’s just too costly or unwieldy to generate hypotheses on one sample and test them on another when, for example, we’re conducting a large field study or testing hard-to-find participants" (2016). A similar comment is made by Tackett and McShane (2018) in their comment on ZELD: “Specifically, large-scale replications are typically only possible when data collection is fast and not particularly costly, and thus they are, practically speaking, constrained to certain domains of psychology (e.g., cognitive and social).”"
ReplyDeleteI always find this reasoning fascinating. To me, it doesn't make much sense, because these "hard to find" or "costly" participants are apparently only "hard to find" or "costly" for 1 study.
After all, chances are high that a next study will use the same "hard to find" or "costly" particpants but now for a different study for which participants and money magically appear all of a sudden.
Wow thawt waѕ odd. I јust wrote an very long comment bᥙt after I clicked submit mү comment diԀn't
ReplyDeletesnow up. Grrrr... welⅼ I'm not writing ɑll that over agаin. Аnyhow, just wantеd to sау
wonderful blog!
Sorry about that - blogger is spammed like crazy, it is basically impossible to allow and moderate comments the last months, so they get caught in crappy spam filters or buried between spam.
Delete