This is a post-publication peer review of "Joy and rigor in behavioral science". A response by the corresponding author, Leslie John, is at the bottom of this post - make sure to read this as well.
In a recent paper “Joy and rigor in behavioral science” https://doi.org/10.1016/j.obhdp.2021.03.002 Hanne Collins, Ashley Whillans, and Leslie John aim to examine the behavioral and subjective consequences of performing confirmatory research (e.g., a preregistered study). In their abstract they conclude from Study 1 that “engaging in a pre-registration task impeded the discovery of an interesting but non-hypothesized result” and from Study 2 that “relative to confirmatory research, researchers found exploratory research more enjoyable, motivating, and interesting; and less anxiety-inducing, frustrating, boring, and scientific.” An enjoyable talk about this paper is available at: https://www.youtube.com/watch?v=y31G63iw2xw.
I like meta-scientific work that examines the consequences of changes in scientific practices. It is likely that new initiatives (e.g., preregistration) will have unintended negative consequences, and describing these consequences will make it possible to prevent them through, for example, education. I also think it is important to examine what makes scientists more or less happy in their job (although in this respect, my prior is very low that topics such as preregistration explain a lot of variance compared to job uncertainty, stress, and a lack of work-life balance).
However, I am less confident in the insights this study provides than the authors suggest in their abstract and conclusion. First, and perhaps somewhat ironically, the authors base their conclusions from Study 1 on exploratory analyses that I am willing to bet are a Type 1 error (or maybe a confound), and are not strong enough to be taken seriously.
In Study 1 researchers are asked to go through a hypothetical research process in a survey. Researchers collect data on whether people do yoga on a weekly basis, how happy participants are today, and the gender of participants. Across 3 conditions, the study was preregistered (see the Figure below), preregistered with a message that they could still explore, and non-preregistered. The researchers counted how many of 7 possible analysis options were selected (including an ‘other’ option). The hypothesis is that if researchers explore more in non-preregistered analyses, they would select more of these 7 analysis options to perform in the hypothetical research project.
The authors write their interest is in whether “participants in the confirmation condition viewed fewer analyses overall and were less likely to view and report the results of the gender interaction”. This first analysis seems to be a direct test of a logical prediction. The second prediction is surprising. Why would researchers care about the results of a gender interaction? It turns out that this is the analysis where the authors have hidden a significant interaction that can be discovered through exploring. Of course, the participants do not know this.
The results showed the following:
A negative binomial logistic regression (Hilbe, 2011) revealed no difference between conditions in the number of analyses participants viewed (Mexploration = 3.48, SDexploration = 2.08; Mconfirmation = 3.79, SDconfirmation = 1.99; Mhybrid = 3.67, SDhybrid = 2.19; all ps ≥ 0.45). Of particular interest, we assessed between condition differences in the propensity to view the results of an exploratory interaction using binary logistic regressions. In the confirmation condition, 53% of participants viewed the results of the interaction compared with 69% in the exploration condition, b = 0.70, SE = 0.24, p = .01.
So the main result here is clear: The is no effect of confirmatory research on the tendency to explore. This is the main conclusion from this analysis. Then, the authors do something very weird. They analyze the item that, unbeknownst to participants, would have revealed a significant interaction. This is one of 7 options participants could click on. The difference they report (p = 0.01) is not significant if the authors correct for multiple comparisons [NOTE: The authors made the preregistration public after I wrote a draft of this blog https://aspredicted.org/dg9m9.pdf and this reveals they did a-priori plan to analyze this item separately – it nicely shows how preregistration allows readers to evaluate the severity of a test (Lakens, 2019), and this test was statistically more severe than I initially though before I had access to the preregistration – I left this comment in the final version of the blog for transparency, and because I think it is a nice illustration of a benefit of preregistration]. But more importantly, there is no logic behind only testing this item. It is, from the perspective of participants, not special at all. They don’t know it will yield a significant result. Furthermore, why would we only care about exploratory analyses that yield a significant result? There are many reasons to explore, such as getting a better understanding of the data.
To me, this study nicely shows a problem with exploration. You might get a significant result, but you don’t know what it means, and you don’t know if you just fooled yourself. This might be noise. It might be something about this specific item (e.g., people realize that due to the CRUD factor, exploring gender interactions without a clear theory is uninteresting, as there are many uninteresting reasons you observe a significant effect). We don’t know what drives the effect on this single item.
The authors conclude “Study 1 provides an “existence proof” that a focus on confirmation can impede exploration”. First of all, I would like it if we banned the term ‘existence proof’ following a statistical test. We did not find a black swan feather, and we didn’t dig up a bone. We observed a p-value in a test that lacked severity, and we might very well be talking about noise. If you want to make a strong claim, we know what to do: Follow up on this study, and show the effect again in a confirmatory test. Results of exploratory analysis of slightly illogical predictions are not ‘existence proofs’. They are a pattern that is worth following up on, but that we can not make any strong claims about as it stands.
In Study 2 we get some insights into why Study 2 was not a confirmatory study replicating Study 1: Performing confirmatory studies is quite enjoyable, interesting, and motivating – but it is slightly less so than exploratory work (see the Figure below). Furthermore, confirmatory tests are more anxiety inducing. I remember this feeling very well from when I was a PhD student. We didn’t want to do a direct replication, because what if that exploratory finding in your previous study didn’t replicate? Then you could no longer make a strong claim based on the study you had. Furthermore, doing the same thing again, but better, is simply less enjoyable, interesting, and motivating. The problem in Study 2 is not in the questions that were asked, but in the questions that were not asked.
For example, the authors did not ask ‘how enjoyable is exploratory research, when after you have written up the exploratory finding, someone writes a blogpost about how that finding does not support the strong claims you have made?’ Yet, the latter might get a lot more weight in the overall evaluation of the utility of performing confirmatory and exploratory research. Another relevant question is ‘How bad would you feel if someone tried to replicate your exploratory finding, but they failed, and they published an article that demonstrated your ‘existence proof’ was just a fluke’? Another relevant question is ‘How enjoyable is it to see a preregistered hypothesis support your prediction’ or ‘How enjoyable are the consequences of providing strong support for your claims for where the paper is published, or how often it is cited, and how seriously it is taken by academic peers’? The costs and benefits of confirmatory studies are multi-facetted. We should look not just at the utility of performing the actions, but at the utility of the consequences. I don’t enjoy doing the dishes, but I enjoy taking that time to call friends and being able to eat from a clean plate. A complete evaluation of the joy of confirmatory research needs to ask questions about all facets that go into the utility function.
To conclude, I like articles that examine consequences of changes in scientific practice, but in this case I felt the conclusion were too far removed from the data. In the conclusion, the authors write “Like exploration, confirmation is integral to the research process, yet, more so than exploration, it seems to spur negative sentiment.” Yet, we could just as easily have concluded from the data that confirmatory and exploratory research are both enjoyable, given the means of 5.39 and 5.87 on a 7 point scale, respectively. If anything, I was surprised by how small the difference in effect size is (a d = 0.33 for how enjoyable both are). Although the authors do not interpret the size of the effects in their paper, that was quite a striking conclusion for myself – I would have predicted the difference in how enjoyable these two types of research were to perform would have been larger. The authors also conclude that about Study 1 that “researchers who were randomly assigned to preregister a prediction were less likely to discover an interesting, non-hypothesized result.” I am not sure this is not just a Type 1 error, as the main analysis yielded no significant result, and my prior is very low that a manipulation that makes people less likely to explore, would only make them less likely to explore the item that, unbeknownst to the participants, would reveal a statistically significant interaction, as I don’t know which mechanism could cause such a specific effect. Instead of exploring this in hypothetical research projects in a survey, I would love to see an analysis of the number of exploratory analyses in published preregistered studies. I would predict that in real research projects, researchers report all the possibly interesting significant results they can find in exploratory analyses.
Leslie John’s response:
- Lakens characterizes Study 1 as exploratory, but it was explicitly a confirmatory study. As specified in our pre-registration, our primary hypothesis was “H1: We will test whether participants in a confirmatory mindset (vs. control) will be less likely to seek results outside of those that would confirm their prediction (i.e., to seek the results of an interaction, as opposed to simply the predicted main effect).” Lakens is accurate in noting that participants did not know a priori that the interaction was significant, but this is precisely our point: when people feel that exploration is off-limits, informative effects can be missed. Importantly, we also stress that a researcher shouldn’t simply send a study off for publication once s/he has discovered something new (like this interaction effect) through exploration; rather “exploration followed by rigorous confirmation is integral to scientific discovery” (p. 188). As a result, Lakens’ questions such as “How bad would you feel if someone tried to replicate your exploratory finding, but they failed, and they published an article that demonstrated your ‘existence proof’ was just a fluke” is not an event that researchers would need to worry about if they follow the guidance we offer in our paper (which underscores the importance of exploration followed by confirmation).
- Lakens raises additional research questions stemming from our findings about the subjective experience of conducting exploratory versus confirmatory work. We welcome additional research to better understand the subjective experience of conducting research in the reform era. We suspect that one reason that research in the reform era can induce anxiety is the fear of post-publication critiques, and the valid concern that such critiques will mispresent both one’s findings as well as one’s motives for conducting the research. We are therefore particularly solicitous of research that speaks to making the research process, including post-publication critique, not only rigorous but joyful.
This comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDelete