The 20% Statistician: Where are all the competent researchers?

Sunday, February 21, 2016

Where are all the competent researchers?

In response to failed replications, some researchers argue that replication studies are especially convincing when the people who performed the replication are ‘competent’ ‘experts’.

Paul Bloom has recently remarked: “Plainly, a failure to replicate means a lot when it’s done by careful and competent experimenters, and when it’s clear that the methods are sensitive enough to find an effect if one exists. Many failures to replicate are of this sort, and these are of considerable scientific value. But I’ve read enough descriptions of failed replications to know how badly some of them are done. I’m aware as well that some attempts at replication are done by undergraduates who have never run a study before. Such replication attempts are a great way to train students to do psychological research, but when they fail to get an effect, the response of the scientific community should be: Meh.”

This mirrors the response by John Bargh after replications of the elderly priming studies yielded no significant effects: “The attitude that just anyone can have the expertise to conduct research in our area strikes me as more than a bit arrogant and condescending, as if designing the conducting these studies were mere child's play.” “Believe it or not, folks, a PhD in social psychology actually means something; the four or five years of training actually matters.”

So where is the evidence we should ‘meh’ replications by novices that show no effect? And how do we define a ‘competent’ experimenter? And can we justify the intuition that a non-significant finding by undergraduate students is ‘meh’, when we are more than willing to submit the work by the same undergraduate when the outcome is statistically significant?

One way to define a competent experimenter is simply by looking who managed to observe the effect in the past. However, this won’t do. If we look at the elderly priming literature, a p-curve analysis gives no reason to assume anything more is going on than p-hacking. Thus, merely finding a significant result in the past should not be our definition of competence. It is a good definition of an ‘expert’, where the difference between an expert and novice is the amount of expertise one has in researching a topic. But I see no reason to believe expertise and competence are perfectly correlated.

There are cases where competence matters, as Paul Meehl reminds us in his lecture series (video 2, 46:30 minutes). He discusses a situation where David Miller provided evidence in support of the ether drift, long after Einstein’s relativity theory explained it away. This is perhaps the opposite as replication showing a null effect, but the competence of Miller, who had the reputation of being a very reliable experimenter, is clearly being taken into account by Meehl. It took until 1955 before the ‘occult result’ observed by Miller was explained by a temperature confound.

Showing that you can reliably reproduce findings is an important sign of competence – if this has been done without relying on publication bias and researchers’ degrees of freedom. This could easily be done in a single well-powered pre-registered replication study, but over the last years, I am not aware of researchers demonstrating their competence in reproducing contested findings in a pre-registered study. I definitely understand researchers prefer to spend their time in other ways than defending their past research. At the same time, I’ve seen many researchers who spend a lot of time writing papers criticizing replications that yield null results. Personally, I would say that if you are going to invest in defending your study, and data collection doesn’t take too much time, the most convincing demonstration of competence is a pre-registered study showing the effect.

So, the idea that there are competent researchers who can reliably demonstrate the presence of effects, which are not observed by others, is not supported by empirical data (so far). In the extreme case of clear incompetence, there is no need for an empirical justification, as the importance of competence to observe an effect is trivially true. It might very well be true under less trivial circumstances. These circumstances are probably not experiments that occur completely in computer cubicles, where people are guided through the experiment by a computer program. I can’t see how the expertise of experimenters has a large influence on psychological effects in these situations. This is also one of the reasons (along with the 50 participants randomly assigned to four between subject conditions) why I don’t think the ‘experimenter bias’ explanation for the elderly priming studies by Doyen and colleagues is particularly convincing (see Lakens & Evers, 2014).

In a recent pre-registered replication project re-examining the ego-depletion effect, both experts and novices performed replication studies. Although this paper is still in press, preliminary reports at conferences and on social media tell us the overall effect is not reliably different from 0. Is expertise a moderator? I have it on good authority that the answer is: No.

This last set of studies shows the importance of getting experts involved in replication efforts, since it allows us to empirically examine the idea that competence plays a big role in replication success. There are, apparently, people who will go ‘meh’ whenever non-experts perform replications. As is clear from my post, I am not convinced the correlation between expertise and competence is 1, but in light of the importance of social aspects of science, I think experts in specific research areas should get more involved in registered replication efforts of contested findings. In my book, and regardless of the outcome of such studies, performing pre-registered studies examining the robustness of your findings is a clear sign of competence.

18 comments:

Tom StaffordFebruary 21, 2016 at 11:28 AM
I think experimental expertise really does exist, but I take the point that we need to be skeptical about assuming it is playing a role in (non)-replications.

One approach is to ensure that all experiments build in "self-testifying" indicators of experimental competence. Positive as well as negative controls, manipulation checks and the like.
ReplyDelete
Replies
AnonymousFebruary 21, 2016 at 1:06 PM
Paul Bloom has recently remarked: “Plainly, a failure to replicate means a lot when it’s done by careful and competent experimenters, and when it’s clear that the methods are sensitive enough to find an effect if one exists. Many failures to replicate are of this sort, and these are of considerable scientific value. But I’ve read enough descriptions of failed replications to know how badly some of them are done. I’m aware as well that some attempts at replication are done by undergraduates who have never run a study before. Such replication attempts are a great way to train students to do psychological research, but when they fail to get an effect, the response of the scientific community should be: Meh.”

When i was a student at university, i almost never saw a professor (competent researcher?) in the lab. Isn't a lot of psychological research actually carried out by research assistants, who could be undergraduates?

If so, then i think many of the *original* experiments Bloom talks about might have very well been executed by undergraduates themselves, which makes his whole reasoning strange.
ReplyDelete
Replies
spaghetticodeFebruary 21, 2016 at 1:45 PM
I run replication experiments with undergrad students. We use large sample sizes, and each study gets an additional positive control. As controls we've used the Retrospective Gambler's Task and the Gain/Loss Framing Effect, both of which have very well defined effect sizes thanks to the Many Labs projects. When the sample sizes are good and he positive controls turn out as expected, we can feel assured that even undergrads can do quality and informative work (here's an example: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0140806).

It's also funny to hear researchers worry about undergraduate researchers in the context of failed replications, when no one seems to worry about undergrad researchers in the context of failed results. I run an undergrad powered neurobiology lab, and have found hat most undergrads with good training can perform electrophysiology, qpcr, and lots of other extremely time and skill-dependent techniques. Our results wi undergrads have been replicated win our own lab and across others. This is the norm in lots of the life sciences. Herding participants into cubicles to comlplete a psychology study (often fully on a computer) does not exactly compare in terms of the need for expertise...

ReplyDelete
Replies
Brad WybleFebruary 21, 2016 at 3:43 PM
This comment of Paul's struck me as incorrect:
"The attitude that just anyone can have the expertise to conduct research in our area strikes me as more than a bit arrogant and condescending, as if designing the conducting these studies were mere child's play.”

The point is not that these students are "designing" the studies. The original authors did that, and that is the part that requires a PhD and years of experience. However once you've done the hard part of designing the study, yes it should be possible for a competent student to conduct the research and replicate the finding, except in the cases where the experiments require particular technical expertise. This is not to say that all replication attempts are done well of course. But the claim that one needs a Phd and many years of experience to even attempt a replication doesn't ring true.

ReplyDelete
Replies
patrickFebruary 22, 2016 at 7:10 PM
Suppose that a competent researcher can replicate a robust psychology effect nearly every time in the laboratory. Suppose also (as is usually the case) that this effect is virtually undetectable in ecologically valid settings.

Example: Bystander effect.

Luckily a brand new graduate student is tasked with replicating this phenomenon. Just like in the real world, where we are actually trying to understand cognition, they read the directions wrong and make critical errors in measurement and data analysis. Thus, the effect fails to replicate. This communicates the correct thing: that the effect in question is unreliable. Unreliability means subject to unpredictable forces like incompetent junior graduate students.

Very few graduate students, even in psychology are unable to replicate the findings of universal gravitation even (especially!) after a few beers. The fact that these students struggle to replicate well known effects is a valid critique of those effects. It has the valuable effect of making the original researcher have to retrench their discussion to include caveats like "only when we do it at our lab." Those caveats are useful things to know!
ReplyDelete
Replies
UnknownFebruary 23, 2016 at 11:17 AM
I would venture that the issue with confounding experience with competence goes much deeper. It has long been assumed that possession of a PhD implies competence. This is a naive assumption. Our PhDs are as good as our training, and well, the fact is that there is no real agreement about what type of training experimental researchers should receive. The actual training researchers do receive is at the discretion of particular departments/supervisors. I'm all for academic freedom, but I have a hard time accepting that as motivation to not produce some type of standardization of the training required to produce competence.
ReplyDelete
Replies

Add comment