A blog on statistics, methods, philosophy of science, and open science. Understanding 20% of statistics will improve 80% of your inferences.

Monday, June 23, 2025

Retrieving Planned Sample Sizes from AsPredicted Preregistrations

It is increasingly common for researchers to preregister their studies (Spitzer and Mueller 2023; Imai et al. 2025). As preregistration is a new practice, it is not surprising that it is not always implemented well. One challenge is that researchers do not always indicate where they deviated from their preregistration when reporting results from preregistered studies (Akker et al. 2024). Reviewers could check whether researchers adhere to the preregistration, but this requires some effort. Automation can provide a partial solution be making information more easily available, and perhaps even performing automated checks of some parts of the preregistration.

Here we demonstrate how Papercheck, our software to perform automated checks on scientific manuscripts, can automatically retrieve the content of a preregistration. The preregistration can then be presented alongside the relevant information in the manuscript. This makes it easier for peer reviewers to compare the information.

We focus on AsPredicted preregistrations as their structured format makes it especially easy to retrieve information (but we also have code to do the same for structured OSF preregistration templates). We can easily search for AsPredicted links in all 250 open access papers from psychological science. The Papercheck package conveniently includes these in xml format in the psychsci object.

Sample Size

Recent metascientific research on preregistrations (Akker et al. 2024) has shown that the most common deviation from a preregistration in practice is that researchers do not collect the sample size that they preregistered. This is not necessarily problematic, as a difference might be only a single datapoint. Nevertheless, researchers should discuss the deviation, and evaluate whether the deviation impacts the severity of the test (Lakens 2024).

Checking the sample size of a study against the preregistration is some effort, as the preregistration document needs to be opened, the correct entry located, and the corresponding text in the manuscript needs to be identified. Recently, a fully automatic comparison tool, (RegCheck) has been created by Jamie Cummins from the University of Bern, that relies on large language models and AI where users upload the manuscript, the preregistration, and receive an automated comparison. We take a slightly different approach. We retrieve the preregistration from AsPredicted automatically, and present users with the information about the preregistered sample size (which is straightforward given the structured approach of the AsPredicted template). We then recommend users to compare this information against the method section in the manuscript.

Preregistration Sample Size Plan

You can access the sample size plan from the results of aspredicted_retrieve() under the column name AP_sample_size.

# get the sample size section from AsPredicted
prereg_sample_size <- unique(prereg$AP_sample_size)

# use cat("> ", x = _) with #| results: 'asis' in the code chunk
# to print out results with markdown quotes
prereg_sample_size |> cat("> ", x = _)

The study will compare four lemur species: ruffed lemur, Coquerel’s sifakas, ring-tailed lemur and mongoose lemur at the Duke Lemur Center. We will test a minimum of 10 and a maximum of 15 individuals for each species based on availability and individual’s willingness to participate at the time of testing.

Paper Sample size

Now we need to check what the achieved sample size in the paper is.

To facilitate this comparison, we can retrieve all paragraphs that contain words such as ‘sample’ or ‘participants’ from the manuscript, in the hope that this contains the relevant text. A more advanced version of this tool could attempt to identify the relevant information in the manuscript with a search for specific words used in the preregistration. Below, we also show how AI can be used to identify the related text in the manuscript. We first use Papercheck’s inbuilt search_text() function to find sentences discussing the sample or participants. For the current paper, we see this simple approach works.

# match "sample" or "# particip..."
regex_sample <- "\\bsample\\b|\\d+\\s+particip\\w+"

# get full paragraphs only from the method section
sample <- search_text(paper, regex_sample, 
                      section = "method", 
                      return= "paragraph")

sample$text |> cat("> ", x = _)

We tested 39 lemurs living at the Duke Lemur Center (for subject information, see Table S1 in the Supplemental Material available online). We assessed four taxonomic groups: ruffed lemurs (Varecia species, n = 10), Coquerel’s sifakas (Propithecus coquereli, n = 10), ringtailed lemurs (Lemur catta, n = 10), and mongoose lemurs (Eulemur mongoz, n = 9). Ruffed lemurs consisted of both red-ruffed and black-and-white-ruffed lemurs, but we collapsed analyses across both groups given their socioecological similarity and classification as subspecies until recently (Mittermeier et al., 2008). Our sample included all the individuals available for testing who completed the battery; two additional subjects (one sifaka and one mongoose lemur) initiated the battery but failed to reach the predetermined criterion for inclusion in several tasks or stopped participating over several days. All tests were voluntary: Lemurs were never deprived of food, had ad libitum access to water, and could stop participating at any time. The lemurs had little or no prior experience in relevant cognitive tasks such as those used here (see Table S1). All behavioral tests were approved by Duke University’s Institutional Animal Care and Use Committee .

The authors planned to test 10 mongoose lemurs, but one didn’t feel like participating. This can happen, and it does not really impact the severity of the test, but the statistical power is slightly lower than desired, and it is a deviation form the original plan - both deserve to be discussed. This papercheck module can remind researchers they deviated from a preregistration, and discuss their deviation, or it can help peer reviewers to notice a deviation is not discussed.

Mongoose Lemurs

Asking a Large Language Model to Compare the Paper and the Preregistration

Although Papercheck’s philosophy is that users should evaluate the information from automated checks, and that AI should be optional and never the default, it can be efficient to send the preregistered sample size and the text reported in the manuscript to a large language model, and compare the preregistration with the text in the method section. This is more costly (both financially and ecologically) but it can work better, as the researchers might not use words like ‘sample’ or ‘participants’ and a LLM provides more flexibility to match text across two documents.

Papercheck makes it easy to extract the method section in a paper:


method_section <- search_text(paper, pattern = "*", section = c("method"), return = "section")

We can send the method section to an LLM, and ask which paragraph is most closely related to the text in the preregistration. Papercheck has a custom function to send text and a query to Groq. We use Groq because of its privacy policy, as it will not retain data or train on data, which is important when sending text from scientific manuscripts that may be unpublished to a LLM. Furthermore, we use an open source model (llama-3.3-70b-versatile).

query_template <- "The following text is part of a scientific article. It describes a performed study. Part of this text should correspond to what researchers planned to do. Before data collection, the researchers stated they would:

%s

Your task is to retrieve the sentence(s) in the article that correspond to this plan, and evaluate based on the text in the manuscript whether researchers followed their plan with respect to the sample size. Start your answer with a 'The authors deviated from their preregistration' if there is any deviation."

# insert prereg text into template
query <- sprintf(query_template, prereg_sample_size)

# combine all relevant paragraphs
text <- paste(method_section$text, collapse = "\n\n")

# run query
llm_response <- llm(text, query, model = "llama-3.3-70b-versatile")
#> You have 499999 of 500000 requests left (reset in 172.799999ms) and 296612 of 300000 tokens left (reset in 677.6ms).

llm_response$answer |> cat("> ", x = _)

The authors deviated from their preregistration. The preregistered plan stated that they would test a minimum of 10 and a maximum of 15 individuals for each species. However, according to the text, they tested 10 ruffed lemurs, 10 Coquerel’s sifakas, 10 ring-tailed lemurs, and 9 mongoose lemurs. The number of mongoose lemurs (9) is below the minimum of 10 individuals planned for each species, indicating a deviation from the preregistered plan.

As we see, the LLM does a very good job evaluating whether the authors adhered to their preregistration in terms of the sample size. The long-run performance of this automated evaluation needs to be validated in future research - this is just a proof of principle - but it has potential for editors who want to automatically check if authors followed their preregistration, and for meta-scientists who want to examine preregistration adherence across a large number of papers. For such meta-scientific use-cases, however, the code needs to be extensively validated and error rates should be acceptably low (i.e., comparable to human coders).

Automated Checks Can Be Wrong!

The use of AI to interpret deviations is convenient, but it can’t replace human judgment. The following article, Exploring the Facets of Emotional Episodic Memory: Remembering “What,” “When,” and “Which” also has a preregistration. A large language model will incorrectly state that the authors deviated from their preregistration. It misses that the authors explicitly say that Cohort B was not preregistered, and therefore, falling short of the planned sample size of 60 in that cohort should not be seen as a deviation from the preregistration. All flagged deviations from a preregistration should be manually checked. Papercheck is only intended to make checks of a preregistration more efficient, but in the end, people need to make the final judgment. The preregistered sample size statement is as follows:

paper <- psychsci$`0956797621991548`
links <- aspredicted_links(paper)
prereg <- aspredicted_retrieve(links)
#> Starting AsPredicted retrieval for 1 files...
#> * Retrieving info from https://aspredicted.org/p4ci6.pdf...
#> ...AsPredicted retrieval complete!

#sample size
prereg_sample_size <- unique(prereg$AP_sample_size)
prereg_sample_size |> cat("> ", x = _)

N = 60. Participants will be recruited from the undergraduate student population of the University of British Columbia, and will be compensated with course credit through the Human Subject Pool Sona system. All participants aged 18-35 will be eligible for participation, and must be fluent in English (to ensure instruction comprehension).

If we send the method section to an LLM and ask it to identify any deviations from the preregistration, we get the following response:

# LLM workflow - send potentially relevant paragraphs

method_section <- search_text(paper, pattern = "*", section = c("method"), return = "section")

# combine all relevant paragraphs
text <- paste(method_section$text, collapse = "\n\n")

query <- sprintf(query_template, prereg_sample_size)
llm_response <- llm(text, query, model = "llama-3.3-70b-versatile")
#> You have 499999 of 500000 requests left (reset in 172.799999ms) and 297698 of 300000 tokens left (reset in 460.4ms).

llm_response$answer |> cat("> ", x = _)

The authors deviated from their preregistration in terms of the sample size for cohort B. According to the preregistration, the researchers planned to collect data from 60 participants in each cohort. However, for cohort B, they were only able to collect data from 56 participants due to the interruption of data collection caused by the COVID-19 pandemic. The sentence that corresponds to the plan is: “Here, we sought to collect data from 60 participants in each cohort.”

Future Research

We believe automatically retrieving information about preregistrations has potential to reduce the workload of peer reviewers, and might function as a reminder to authors that they should discuss deviations from the preregistration. The extent to which this works out in practice should be investigated.

We have only focused on an automated check for the preregistered sample size. Other components of a preregistration, such as exclusions criteria, or the planned analysis, are also important to check. It might be more difficult to create automated checks for these components, given the great flexibility in how especially statistical analyses are reported. In an earlier paper we have discussed the benefits of create machine readable hypothesis tests, and we have argued that this should be considered the gold standard for a preregistration (Lakens and DeBruine 2021). Machine readable hypothesis tests would allow researchers to automatically check if preregistered analyses are corroborated or falsified. But we realize it will be some years before this becomes common practice.

There are a range of other improvements and extensions that should be developed, such as support for multi-study papers that contain multiple preregistrations, and extending this code to preregistrations on other platforms, such as the OSF. If you are interested in developing this papercheck module further, or performing such a validation study, do reach out to us.

References

Akker, Olmo R. van den, Marjan Bakker, Marcel A. L. M. van Assen, Charlotte R. Pennington, Leone Verweij, Mahmoud M. Elsherif, Aline Claesen, et al. 2024. “The Potential of Preregistration in Psychology: Assessing Preregistration Producibility and Preregistration-Study Consistency.” Psychological Methods, October. https://doi.org/10.1037/met0000687.
Imai, Taisuke, Séverine Toussaert, Aurélien Baillon, Anna Dreber, Seda Ertaç, Magnus Johannesson, Levent Neyse, and Marie Claire Villeval. 2025. Pre-Registration and Pre-Analysis Plans in Experimental Economics. 220. I4R Discussion Paper Series. https://www.econstor.eu/handle/10419/315047.
Lakens, Daniël. 2024. “When and How to Deviate from a Preregistration.” Collabra: Psychology 10 (1): 117094. https://doi.org/10.1525/collabra.117094.
Lakens, Daniël, and Lisa M. DeBruine. 2021. “Improving Transparency, Falsifiability, and Rigor by Making Hypothesis Tests Machine-Readable.” Advances in Methods and Practices in Psychological Science 4 (2): 2515245920970949. https://doi.org/10.1177/2515245920970949.
Spitzer, Lisa, and Stefanie Mueller. 2023. “Registered Report: Survey on Attitudes and Experiences Regarding Preregistration in Psychological Research.” PLOS ONE 18 (3): e0281086. https://doi.org/10.1371/journal.pone.0281086.

No comments:

Post a Comment