Researchers are often reminded that replications are a cornerstone of empirical science (e.g., Koole & Lakens, 2012). However, we don’t need to regard every replication as equally valuable. Although most researchers will agree that a journal editor who rejects a manuscript reporting 20 high-powered direct replications of the Stroop-effect (Stroop, 1935) is making the right decision, they also know that some replications are worthy of being performed and published. Cumulative scientific knowledge requires a balance between original research and close replications of important findings.
The question when a close replication of an empirical
finding is of sufficient value to the scientific community to justify being
performed and published is an important question for any science that operates
within financial and time constraints. Some years ago, I started a project on The Replication Value. The goal of the
replication value was to create a quantitative and objective index to determine
the value and importance of a close replication. The Replication Value can
guide decisions of what to replicate directly, and can serve as a tool both for
researchers to assess whether time and resources should be spent on replicating
a finding, and for journal editors to help determine whether close replications
should be considered for publication.
Developing a formula that can quantify the value of a
replication is an interesting challenge. I realized I needed more knowledge of
statistics before I could contribute, and, even though we were working with a
pretty large team, I think it’s even better if even more people contribute
suggestions.
Now, Courtney Soderberg and Charlie Ebersole have taken over
the coordination of this project, and from now, anyone who feels like
contributing to this important question can generate candidate formulas. Read
more about how to contribute here.
Want to demonstrate how the replication value can only be computed using Bayesian
statistics? Convinced we need to rely on estimation instead? Show us what’s the
best way to quantify the value of replications, and earn authorship to what
will no doubt be a nice paper in the end.
My approach
I’m not going to give away my approach completely – I don’t
want to limit the creativity of others – but I want to give some pointers to
get people started.
I think at least two components determine the Replication
Value of empirical findings: the impact
of the effect, the precision of the
effect size estimate. Quantifying the impact
of studies is notably difficult, but I think citation counts are an easy to use
proxy. Based on the idea that more data yields a better estimate of the
population effect size, sample size is a dominant factor in precision (Borenstein, Hedges, Higgins,
& Rothstein, 2009). The larger the sample, the lower the variance of the
effect size estimate, which leads to a narrower confidence interval around the
effect size estimate. We can take the precision of the effect size estimate: The
confidence interval for r is calculated by first transforming r to Fisher’s z:
z=0.5 ×ln((1+r)/(1-r))
A very good approximation of the variance of z is:
Vz= 1/(n-3)
The confidence interval can then be calculated as normal:
95% CI=r ±1.96*√(Vz )
The values acquired through this procedure can be
transformed back to r using:
r= (e^(2 × z)-1)/(e^(2 × z)+1)
where the z value is the z transformed upper or lower
boundary of the 95% CI.
By expressing the width of the confidence interval of the
effect size estimate of an effect as a percentage of the total possible width
of the confidence interval, we have an index of the precision of the effect
size estimate, which I call the ‘spielraum’, or the playing field, based on the
conceptual similarity to the precision of a theoretical prediction in Meehl’s
(1990) work on appraising theories.
Now the tricky thing is how these two factors interact, and
determine the replication value. While I’m going back to solve that question,
perhaps you want to propose a completely different approach. I mean, really,
this is a question that requires Bayesian statistics, right? Are citation counts the absolutely worst way to quantify impact?
See how to contribute here: https://docs.google.com/document/d/1ufO7gwwI2rI7PnESn4wDA-pLcns46NyE7Pp-3zG3PN8/edit
I really look forward to your suggestions.
In bayes stats you would calculate kullback-leibler divergence between the prior and the posterior. To cite wikipedia KL is "a measure of the information gain in moving from a prior distribution to a posterior distribution". You can compare KL across replication studies and select a study with the highest information gain. The problem is that you can arbitrarily blow the index by selecting an uninformative prior for your favorite study. This problem also affects your variance measure (why take "total possible width" of CI instead of density that corresponds to prior knowledge? What density expresses the prior knowledge best) The whole idea can work, but it requires a serious effort in expressing and justifying the prior knowledge on the part of the researcher. This only underscores that without a concept of prior knowledge, concepts such as surprise, information value or impact of scientific work are meaningless.
ReplyDeleteI agree it's a challenge. And a challenge that would really be worthwhile using your expertise on. You'll gain co-authorship, but more importantly, develop a tool that could really improve cumulative science.
DeleteAlright, if I find the time I will make a submission :)
DeleteCool! I look forward to it and I'm sure you will bring a very important perspective to the table!
DeleteAlright, if I find the time I will make a submission :)
ReplyDeleteHi Daniel,
ReplyDeleteI think that a precise estimate of the available evidence related to a specific phenomenon or a line of research is not difficult to achieve and your proposal and/or a Bayesian analogue is at hand.
On the contrary, I cannot figure out how to derive a math formula related to the "worth to be replied" side of the problem without considering all sociological, applied, economical, ideological, etc., components intrinsic in the scientific research.
I try to list some of the criteria without an hierarchical order:
- phenomena described in the main books for undergraduate and graduate students which usually are considered as taken for granted;
- phenomena which could change some mainstream paradigms, e.g. cognitive neuroscience will replace cognitive psychology; human mind shows quantum-like phenomena, etc.;
- Applications which can reduce terrorist attacks, energy waste, improve physical and mental health reducing state economic resources, etc.
What about?
Patrizio