Thursday, January 18, 2018

The Costs and Benefits of Replications


This blog post is based on a pre-print by Coles, Tiokhin, Scheel, Isager, and Lakens “The Costs and Benefits of Replications”, submitted to Behavioral Brain Sciences as a commentary on “Making Replication Mainstream”.

In a summary of recent discussions about the role of direct replications in psychological science, Zwaan, Etz, Lucas, and Donnellan (2017) argue that replications should be more mainstream. The debate about the importance of replication research is essentially driven by disagreements about the value of replication studies, in a world where we need to carefully think about the best way to allocate limited resources when pursuing scientific knowledge. The real question, we believe, is when replication studies are worthwhile to perform.

Goldin-Meadow stated that "it’s just too costly or unwieldy to generate hypotheses on one sample and test them on another when, for example, we’re conducting a large field study or testing hard-to-find participants" (2016). A similar comment is made by Tackett and McShane (2018) in their comment on ZELD: “Specifically, large-scale replications are typically only possible when data collection is fast and not particularly costly, and thus they are, practically speaking, constrained to certain domains of psychology (e.g., cognitive and social).”

Such statements imply a cost-benefit analysis. But these scholars do not quantify their costs and benefits. They hide their subjective expected utility (what is a large-scale replication study worth to me) behind absolute statements, as they write “is” and “are” but really mean “it is my subjective belief that”. Their statements are empty, scientifically speaking, because they are not quantifiable. What is “costly”? We can not have a discussion about such an important topic if researchers do not specify their assumptions in quantifiable terms.

Some studies may be deemed valuable enough to justify even quite substantial investments to guarantee that a replication study is performed. For instance, because it is unlikely that anyone will build a Large Hadron Collider to replicate the studies at CERN, there are two detectors (ATLAS and CMS) so that independent teams can replicate each other’s work. That is, not only do these researchers consider it important to have a very low (5 sigma) alpha level when they analyze data, they also believe it is worthwhile to let two team independently do the same thing. As a physicist remarks: “Replication is, in the end, the most important part of error control. Scientists are human, they make mistakes, they are deluded, and they cheat. It is only through attempted replication that errors, delusions, and outright fraud can be caught.” Thus, high cost is not by itself a conclusive argument against replication. Instead, one must make the case that the benefits do not justify the costs. Again, I ask: what is “costly”?

Decision theory is a formal framework that allows researchers to decide when replication studies are worthwhile. It requires researchers to specify their assumptions in quantifiable terms. For example, the expected utility of a direct replication (compared to a conceptual replication) depends on the probability that a specific theory or effect is true. If you believe that many published findings are false, then directly replicating prior work may be a cost-efficient way to prevent researchers from building on unreliable findings. If you believe that psychological theories usually make accurate predictions, then conceptual extensions may lead to more efficient knowledge gains than direct replications. Instead of wasting time arguing about whether direct replications are important or whether conceptual replications are important, do the freaking math. Tell us at which probability that H0 is true you think it is efficient enough to weed out false positives from the literature through direct replications. Show us, by pre-registering all your main analyses, that you are building on strong theories that allow you to make correct predictions with a 92% success rate, and that you therefore do not feel direct replications are the more efficient way to gain knowledge in your area.

I am happy to see our ideas about the importance of using decision theory to determine when replications are important enough to perform were independently replicated in this commentary on ZELD by Hardwicke, Tessler, Peloquin, and Frank. We have collaboratively been working on a manuscript to specify the Replication Value of replication studies for several years, and with the recent funding I received, I’m happy that we can finally dedicate the time to complete this work. I look forward to scientists explicitly thinking about the utility of the research they perform. This is an important question, and I can’t wait for our field to start discussing ways to answer how we can quantify the utility of the research we perform. This will not be easy. But unless you never think about how to spend your resources, you are making these choices implicitly all the time, and this question is too important to give up without even trying. In our pre-print, we illustrate how all concerns raised against replication studies basically boil down to a discussion about their costs and benefits, and how formalizing these costs and benefits would improve the way researchers discuss this topic.

6 comments:

  1. Perhaps it would be easier for people to get into justifying replications in this way if they were also in the habit of similarly justifying why they run their initial studies. I'm not convinced that this is always the case. "Because we have a grad student who thinks this is interesting and a participant pool who have quotas to meet" does not necessarily meet this criterion, I would suggest.

    ReplyDelete
  2. "Tell us at which probability that H0 is true you think it is efficient enough to weed out false positives from the literature through direct replications. Show us, by pre-registering all your main analyses, that you are building on strong theories that allow you to make correct predictions with a 92% success rate, and that you therefore do not feel direct replications are the more efficient way to gain knowledge in your area."

    Hmm, not sure if i agree with (my interpretation of) this. I reason:

    1) it seems to me that it is impossible to determine/gauge the percentage of hypotheses that (will) turn out to be "correct"/"proven".

    2) more importantly, it seems to me that a) it doesn't matter what this percentage is (within reasonable boundaries), and b) it is not even desirable to try and determine/gauge this percentage, because i reason both a) and b) are irrelevant for building knowledge.

    What matters in my reasoning is amassing things like optimally gathered (and thus maximaly informational) data, and arguments/reasoning, which can both be used for things like theory-building and -testing.

    I am all for "(...) scientists explicitly thinking about the utility of the research they perform", but i reason it might make more sense to think about this for all research, not just replications. In fact, i reason that it is more important for "original" research, because i predict that (nearly) all else will follow automatically once things are done more optimally from the start.

    The bottom line to me is, many of the things that might be wrong in psychological science seem to me to be connected, and based on a few things that need to be improved. I reason the rest of the problems will solve themselves automatically.

    Here is an idea which tries to solve some of the basic things that might be connected, and which i reason will also help solve other issues. I called it after what i think is a summarization of the few basic problems which i reason can all easily be solved: “Science is dependent on scientists (old flawed model?) V scientistst are dependent on Science (new improved model?)”:

    http://andrewgelman.com/2017/12/17/stranger-than-fiction/#comment-628652

    I hope you will (also) focus on how to optimally perform research in general, and not just on replications. I reason when the former is done, the rest will follow automatically. Good luck/ all the best with your work on this important topic !!

    ReplyDelete
    Replies
    1. The (gist of) this reasoning in slightly different words using a different blogpost on this topic: https://pedermisager.netlify.com/post/what-to-replicate/

      In the blogpost by Isager various reasons are given why researchers could have decided to replicate certain findings. I was wondering if you have thought about the possibility of *not* replicating, and/or giving attention to, any past work.

      If we want to take into account your assumptions regarding resource constraints and the willingness to want to replicate, it might be way more fruitful (and perhaps ethical and responsible) for researchers to not replicate any past work but concentrate on replicating current/future work.

      I reason all the different reasons researchers give to replicate past work might all be considered to be equivalent from the perspective of a cumulative science. I reason this is because all the different reasons Isager provides are, could be, or will be intertwined and influenced by eachother. I reason from the perspective of viewing psychological science as a cumulative science, it therfore possibly doesn't matter 1) what the reason is for replicating among your examples, 2) it could even be the reason *not* to replicate, and 3) the "starting point" in a research program (e.g. a direct replication of past work) is perhaps way less important than the entire processs of that research program.

      For instance, assuming the narrative of the past few years is (partly) correct that "sexy" (but probably based on low-quality studies) findings have been rewarded, it could be reasoned that these "sexy" findings will have had theoretical impact, gathered personal interest, influenced policy, and ammassed many citations. If this makes any sense, all the reasons researchers give for replicating past work in your blogpost, may in fact be the exact reasons why they *shouldn't* want to replicate them given resource constraints and wanting to replicate things. All this replication of past work might be giving attention to sub-optimal work, and researchers, for a 2nd time ?! Also possibly see "Replication initiatives will not salvage the trustworthiness of psychology" by J. C. Coyne (https://bmcpsychology.biomedcentral.com/articles/10.1186/s40359-016-0134-3)

      Here is a link to a research (and publication) format that incorporates direct replications of "new" work, and that involves a more continuous and cumulative manner of replicating and doing research:

      http://andrewgelman.com/2017/12/17/stranger-than-fiction/#comment-628652

      Delete
  3. "Goldin-Meadow stated that "it’s just too costly or unwieldy to generate hypotheses on one sample and test them on another when, for example, we’re conducting a large field study or testing hard-to-find participants" (2016). A similar comment is made by Tackett and McShane (2018) in their comment on ZELD: “Specifically, large-scale replications are typically only possible when data collection is fast and not particularly costly, and thus they are, practically speaking, constrained to certain domains of psychology (e.g., cognitive and social).”"

    I always find this reasoning fascinating. To me, it doesn't make much sense, because these "hard to find" or "costly" participants are apparently only "hard to find" or "costly" for 1 study.

    After all, chances are high that a next study will use the same "hard to find" or "costly" particpants but now for a different study for which participants and money magically appear all of a sudden.

    ReplyDelete
  4. Wow thawt waѕ odd. I јust wrote an very long comment bᥙt after I clicked submit mү comment diԀn't
    snow up. Grrrr... welⅼ I'm not writing ɑll that over agаin. Аnyhow, just wantеd to sау
    wonderful blog!

    ReplyDelete
    Replies
    1. Sorry about that - blogger is spammed like crazy, it is basically impossible to allow and moderate comments the last months, so they get caught in crappy spam filters or buried between spam.

      Delete