Since more or less a decade there has been sufficient momentum in science to not just complain about things scientists do wrong, but to actually do something about it. When social psychologists declared a replication crisis in the 60’and 70’s, nothing much changed (Lakens, 2023). They also complained about bad methodology, flexibility in the data analysis, a lack of generalizability and applicability, but no concrete actions to improve things emerged from this crisis.
After the
2010 crisis in psychology, scientists did make changes to how they work. Some
of these changes were principled, others less so. For example, badges were
introduced for certain open science practices, and researchers implementing
these open science practices would get a badge presented alongside their
article. This was not a principled change, but a nudge to change behavior.
There were also more principled changes. For example, if researchers say they
make error-controlled claims at a 5% alpha level, they should make error
controlled claims at a 5% alpha level, and they should not engage in research
practices that untransparently inflate the Type 1 error rate. The introduction
of a practice such as preregistration had the goal to prevent untransparently
inflating Type 1 error rates, by making any possible inflation transparent.
This is a principled change because it increases the coherence of research
practices.
As these
changes in practices became more adopted, a large group of researchers was
confronted with requirements such as having to justify their sample size,
indicate whether they deserved an open science badge, or make explicit that a
claim was exploratory (i.e., not error controlled). As more people were
confronted with these changes, the absolute number of people critical about
these changes increased. A very reasonable question to ask as a scientist is
‘Why?’, and so people asked: “Why should I do this new thing?’. 
There are
two ways to respond to the question why scientific practices need to change.
The first justification is ‘because science will improve’. This is an empirical
justification. The world is currently in a certain observable state, and if we
change things about our world, it will be in a different, but better,
observable state. The second justification is ‘because it logically follows’.
This is, not surprisingly, a logical argument. There is a certain way of
working that is internally inconsistent, and there is a way of working that is
consistent. 
An
empirical justification requires evidence. A logical justification requires
agreement with a principle. If we want to justify preregistration empirically,
we need to provide evidence that it improved science. If you want to disagree
with the claim that preregistration is a good idea, you need to disagree with
the evidence. If we want to justify preregistration logically, we need to
people to agree with the principle that researchers should be able to
transparently evaluate how coherently their peers are acting (e.g., they are
not saying they are making an error controlled claim, when in actuality they
did not control their error rate). 
Why
evidence for better science is practically impossible. 
Although it
is always difficult to provide strong evidence for a claim, some things are
more difficult to study than others. Providing evidence that a change in
practice improves science is so difficult, it might be practically impossible.
Paul Meehl, one of the first meta-scientists, developed the idea of cliometric
meta-theory, or the empirical investigation of which theories are doing better
than others. He proposes to follow different theories for something like 50
years, and see which one leads to greater scientific progress. If we want to
provide evidence that a change in practice improves science, we need something
similar. So, the time scale we are talking about makes the empirical study of
what makes science ‘better’ difficult. 
But we also
need to collect evidence for a causal claim, which requires excluding
confounders. A good start would be to randomly assign half of the scientists to
preregister all their research for the next fifty years, and order half not to.
This is the second difficulty: It is practically impossible to go beyond
observational data, and this will always have confounds. But even if we would
be able to manipulate something, the assumption that the control condition is
not affected by the manipulation is too likely to be violated. The people who
preregister will – if they preregister well – have no flexibility in the data
analysis, and their alpha levels are controlled. But the people in the control
condition know about preregistration as well. After p-hacking their way to a p
= 0.03 in Study 1, p = 0.02 in Study 2, and p = 0.06 (marginally significant)
in Study 3, they will look at their studies and wonder if these people will
take their set of studies seriously. Probably not. So, they develop new
techniques to publish evidence for what they want to be true – for example by
performing large studies with unreliable measures and a tiny sprinkle of confounds,
which consistently yield low p-values.
So after
running several studies for 50 years each, we end up with evidence that is not
particularly difficult to poke holes in. We have invested a huge amount of
effort, for what we should know from the outset will yield very little gain.
As we wrote
in our recent paper “The
benefits of preregistration and Registered Reports” (Lakens
et al., 2024):
It is
difficult to provide empirical support for the hypothesis that preregistration
and Registered Reports will lead to studies of higher quality. To test such a
hypothesis, scientists should be randomly assigned to a control condition where
studies are not preregistered, a condition where researchers are instructed to
preregister all their research, and a condition where researchers have to
publish all their work as a Registered Report. We would then follow the success
of theories examined in each of these three conditions in an approach Meehl
(2004) calls cliometric metatheory by empirically examining which theories
become ensconced, or sufficiently established that most scientists consider the
theory as no longer in doubt. Because such a study is not feasible, causal
claims about the effects of preregistration and Registered Reports on the
quality of research are practically out of reach.
At this
time, I do not believe there will ever be sufficiently conclusive empirical
evidence for causal claims that a change in scientific practice makes science
better. You might argue that my bar for evidence is too high. That conclusive
empirical evidence in science is rarely possible, but that we can provide
evidence from observational studies – perhaps by attempting to control for the
most important confounds, measuring decent proxies of ‘better science’ on a
shorter time scale. I think this work can be valuable, and it might convince
some people, and it might even lead to a sufficient evidence base to warrant policy
change by some organizations. After all, policies need to be set anyway, and
the evidence base for most of the policies in science are based on weak
evidence, at best. 
A little
bit of logic is worth more than two centuries of cliometric metatheory. 
Psychologists
are empirically inclined creatures, and to their detriment, they often trust
empirical data more than logical arguments. We published the nine studies on precognition
by Daryl Bem because they followed standard empirical methods and yielded
significant p values, even when one of the reviewers pointed out that
the paper should be rejected because it logically violated the laws of physics.
Psychologists too often assign more weight to a p value than to logical
consistency. 
And yet, a
little bit of logic will often yield much greater returns, with much less
effort. A logical justification of preregistration does not require empirical
evidence. It just needs to point out that it is logically coherent to
preregister. Logical propositions have premises and a conclusion: If X, then Y.
In
meta-science logical arguments are of the form ‘if we have the goal to generate
knowledge following a certain philosophy of science, then we need to follow certain
methodological procedures.’ For example, if you think it is a fun idea
to take Feyerabend seriously and believe that science progresses in a system
that cannot be captured by any rules, then anything goes. Now let’s try
a premise that is not as stupid as the one proposed by Feyerabend, and entertain
the idea that some ways of doing science are better than others. For example,
you might believe that scientists generate knowledge by making statistical
claims (e.g., ‘we reject the presence of a correlation larger than r = 0.1’)
that are not too often wrong. If this aligns with your philosophy of science, you
might think the following proposition is valid: If a scientist wants to
generate knowledge by making statistical claims that are not too often wrong, then
they need to control their statistical error rates’. This puts us in Mayo’s
error-statistical philosophy. We can change the previous proposition, which was
written on the level of individual scientist, if we believe that science is not
an individual process, but a social one. A proposition that is more in line
with a social epistemological perspective would be: “If the scientific
community wants to generate knowledge by making statistical claims that are not
too often wrong, then they need to have procedures in place to evaluate
which claims were made by statistically controlling error rates”. 
This in
itself is not a sufficient argument for preregistration, because there are many
procedures that we could rely on. For example, we can trust scientists. If they
do not say anything about flexibly analyzing their data, we can trust that they
did not flexibly analyze their data. You can also believe that science should
not be based on trust. Instead, you might believe that scientists should be
able to scrutinize claims by peers, and that they should not have to take their
word for it: Nullius in Verba. If so, then science should be transparent. You
do not need to agree with this, of course, just as you did not have to agree
with the premise that the goal of science is to generate claims that are not
too often wrong. If we include this premise, we get the following proposition:
“If the scientific community wants to generate knowledge by making
statistical claims that are not too often wrong, and if scientists
should be able to scrutinize claims by peers, then they need to have
procedures in place for peers to transparently evaluate which claims were made
by statistically controlling error rates”. 
Now we have
a logical argument for preregistration as one change in the way scientists
work, because it makes it more coherent. Preregistration is not the only
possible change to make science coherent. For example, we could also test all
hypotheses in the presence of the entire scientific community, for example by
live-streaming and recording all research that is being done. This would also
be a coherent improvement to how scientists work, but it would also be more
cumbersome. The hope is that preregistration, when implemented well, is a more
efficient change to make science more coherent.
Should
logic or evidence be the basis of change in science? 
Which of
the two justifications for changes in scientific practice is more desirable? A
benefit of evidence is that it can convince all rational individuals, as long
as it is strong enough. But evidence can be challenged, especially when it is
weak. This is an important feature of science, but when disagreements about the
evidence base can not be resolved, it quickly leads to ‘even the experts are do
not agree about what the data shows’. A benefit of logic is also that it should
convince rational individuals, as long as they agree with the premise. But not
everyone will agree with the premise. Again, this is an important feature of
science. It might be a personal preference, but I actually like disagreements
about the premises of what the goals of science are. Where disagreements about
evidence are temporarily acceptable, but in the long run undesirable,
disagreements about what the goals of science are is good for the diversity in
science. Or at least that is a premise I accept. 
As I see
it, the goal should not be to convince people to implement certain changes to
scientific practice per se, but to get scientists to behave in a coherent
manner, and to implement changes to their practice if this makes their practice
more coherent. Whether practices are coherent or not is unrelated to whether
you believe practices are good, or desirable. Those value judgments are part of
your decision to accept or reject a premise. You might think it is undesirable
that scientists make claims, as this will introduce all sorts of undesirable
consequences, such as confirmation bias. Then, you would choose a different
philosophy of science. That is fine, as long as you then implement research
practices that logically follow from the premises. Empirical research can guide
you towards or away from accepting certain premises. For example,
meta-scientists might describe facts that make you believe scientists are
extremely trustworthy, and transparency is not needed. Meta-scientists might
also point out ways in which research practices are not coherent with certain
premises. For example, if we believe transparency is important, but most
researchers selectively publish results, then we have identified in incoherency
that we might need to educate people about, or we need to develop ways for
researchers to resolve this incoherency (such as developing preprint servers
that allow researchers to share all results with peers). And for some changes
to science, such as the introduction of Open Science Badges, there might not be
any logical justifications (or if they exist, I have not seen them). For those
changes, empirical justifications are the only possibility. 
Conclusion
As changes
to scientific practice become more institutionalized, it is only fair that
researchers ask why these changes are needed. There are two possible
justifications: One based on empirical evidence, and one on logically coherent
procedures that follow from a premise. Psychologists might intuitively believe
that empirical evidence is the better justification for a practice. I
personally doubt it. I think logical arguments will often provide a stronger
foundation, especially when scientific evidence is practically difficult to
collect. 
 
No comments:
Post a Comment