# The 20% Statistician

A blog on statistics, methods, philosophy of science, and open science. Understanding 20% of statistics will improve 80% of your inferences.

## Wednesday, March 2, 2016

### The difference between a confidence interval and a capture percentage

I was reworking a lecture on confidence intervals I’ll be teaching, when I came across a perfect real life example of a common error people make when interpreting confidence intervals. I hope everyone (Harvard Professors, Science editors, my bachelor students) will benefit from a clear explanation of this misinterpretation of confidence intervals.

Let’s assume a Harvard professor and two Science editors make the following statement:
If you take 100 original studies and replicate them, then “sampling error alone should cause 5% of the replication studies to “fail” by producing results that fall outside the 95% confidence interval of the original study.”*

The formal meaning of a confidence interval is that 95% of the confidence intervals should, in the long run, contain the true population parameter. See Kristoffer Magnusson’s excellent visualization, where you can see how 95% of the confidence intervals include the true population value. Remember that confidence intervals are a statement about where future confidence intervals will fall.

Single confidence intervals are not a statement about where the means of future samples will fall. The percentage of means in future samples that falls within a single confidence interval is called the capture percentage. The percentage of future means that fall within a single unbiased confidence interval depends upon which single confidence interval you happened to observe, but in the long run, 95% confidence intervals have a 83.4% capture percentage (Cumming & Maillardet, 2006). In other words, in a large number of unbiased original studies, 16.6% (not 5%) of replication studies will observe a parameter estimate that falls outside of a single confidence interval. (Note that this percentage assumes an equal sample size in the original and replication study – if sample sizes differ, you would need to simulate the capture percentages for each study.)

Let’s experience this through simulation. Run the entire R script available at the bottom of this post. This scripts will simulate a single sample with a true population mean of 100 and standard deviation of 15 (the mean and SD of an IQ test), and create a plot. Samples drawn from this true population will show variation, as you can see from the mean and standard deviation of the sample in the plot. The black dotted line illustrates the true mean of 100. The orange area illustrates the 95% confidence interval around the sample mean, and 95% of orange bars will contain the black dotted line. For example:

The simulation also generates a large number of additional samples, after the initial one that was plotted. The simulation returns the number of confidence intervals from these simulations that contain the mean (which should be 95% in the long run). The simulation also returns the % of sample means from future studies that fall within the 95% of the original study. This is the capture percentage. It differs from (and is typically lower than) the confidence interval.

Q1: Run the simulations multiple times (the 100000 simulations take a few seconds). Look at the output you will get in the R console. For example: “95.077 % of the 95% confidence intervals contained the true mean” and “The capture percentage for the plotted study, or the % of values within the observed confidence interval from 88.17208 to 103.1506 is: 82.377 %”. While running the simulations multiple times, look at the confidence interval around the sample mean, and relate this to the capture percentage. Which statement is true?

A) The further the sample mean in the original study is from the true population mean, the lower the capture percentage.
B) The further the sample mean in the original study is from the true population mean, the higher the capture percentage.
C) The wider the confidence interval around the mean, the higher the capture percentage.
D) The narrower the confidence interval around the mean, the higher the capture percentage.

Q2: Simulations in R are randomly generated, but you can make a specific simulation reproducible by setting the seed of the random generation process. Copy-paste “set.seed(123456)” to the first line of the R script, and run the simulation. The sample mean should be 108 (see the picture below). This is a clear overestimate of the true population parameter. Indeed, the just by chance, this simulation yielded a result that is significantly different from the null hypothesis (the mean IQ of 100), even though it is a Type 1 error. Such overestimates are common in a literature rife with publication bias. A recent large scale replication project revealed that even for studies that replicated (according to a p < 0.05 criterion), the effect sizes in the original studies were substantially inflated. Given the true mean of 100, many sample means should fall to the left of the orange bar, and this percentage is clearly much larger than 5%. What is the capture percentage in this specific situation where the original study yielded an upwardly biased estimate?

A) 95% (because I believe Harvard Professors and Science editors over you and your simulations!)
B) 42.2%
C) 84.3%
D) 89.2%

I always find it easier to see how statistics work, if you can simulate them. I hope this example makes it clear what the difference between a confidence interval and a capture percentage is.

* This is a hypothetical statement. Any similarity to commentaries that might be published in Science in the future is purely coincidental.

1. Nice explanation of capture percentage, clearly differentiating it from coverage percentage. AND, thanks for the link to Magnusson's mesmerizing demo.

2. Note how Nate Silver gets this wrong in regard to polling, despite his linking to a correct definition. (Some commentators attempted explanations.)
http://errorstatistics.com/2016/02/12/rubbing-off-uncertainty-confidence-and-nate-silver/

3. Nice post - although you might want to include the 95% replication capture intervals (what you should do for this type of inference) as a comparator for the the 95% CI.

Hacked into your script below:

if(!require(ggplot2)){install.packages('ggplot2')}
library(ggplot2)

n=30 #set sample size
nSims<-1000 #set number of simulations

x<-rnorm(n = n, mean = 100, sd = 15) #create sample from normal distribution
samplemean <-mean(x)
#95%CI
CIU<-samplemean+qt(0.975, df = n-1)*sd(x)*sqrt(1/n)
CIL<-samplemean-qt(0.975, df = n-1)*sd(x)*sqrt(1/n)
RCIU<-samplemean+qt(0.975, df = n-1)*sd(x)*sqrt(2/n)
RCIL<-samplemean-qt(0.975, df = n-1)*sd(x)*sqrt(2/n)

#plot data
#png(file="CI_mean.png",width=2000,height=2000, res = 300)
ggplot(as.data.frame(x), aes(x)) +
geom_rect(aes(xmin=CIL, xmax=CIU, ymin=0, ymax=Inf), fill="#E69F00") +
geom_histogram(colour="black", fill="grey", aes(y=..density..), binwidth=2) +
xlab("IQ") + ylab("number of people") + ggtitle("Data") + theme_bw(base_size=20) +
theme(panel.grid.major.x = element_blank(), axis.text.y = element_blank(), panel.grid.minor.x = element_blank()) +
geom_vline(xintercept=100, colour="black", linetype="dashed", size=1) +
coord_cartesian(xlim=c(50,150)) + scale_x_continuous(breaks=c(50,60,70,80,90,100,110,120,130,140,150)) +
annotate("text", x = mean(x), y = 0.02, label = paste("Mean = ",round(mean(x)),"\n","SD = ",round(sd(x)),sep=""), size=6.5)
#dev.off()

#Simulate Confidence Intervals
CIU_sim<-numeric(nSims)
CIL_sim<-numeric(nSims)
RCIU_sim<-numeric(nSims)
RCIL_sim<-numeric(nSims)
mean_sim<-numeric(nSims)
capture = 0
Tcrit = qt(0.975, df = n-1)
for(i in 1:nSims){ #for each simulated experiment
x<-rnorm(n = n, mean = 100, sd = 15) #create sample from normal distribution
sim_mean = mean(x)
CIW = Tcrit*sd(x)*sqrt(1/n)
CIU_sim[i]<-sim_mean+CIW
CIL_sim[i]<-sim_mean-CIW
RCIU_sim[i]<-sim_mean+CIW*sqrt(2)
RCIL_sim[i]<-sim_mean-CIW*sqrt(2)
mean_sim[i]<-sim_mean #store means of each sample
for (j in 1:i){
if(mean_sim[i]<=RCIU_sim[j]&&mean_sim[i]>=RCIL_sim[j]){
capture=capture+1
}
if(mean_sim[j]<=RCIU_sim[i]&&mean_sim[j]>=RCIL_sim[i]){
capture=capture+1
}
}
}

#How many simulations does the true value lie outside the 95% CI
CIU_sim<-CIU_sim[CIU_sim<100]
CIL_sim<-CIL_sim[CIL_sim>100]

#How many simulations does our original observed value lie outside the 95% RCI
RCIU_sim<-RCIU_sim[RCIU_simsamplemean]

cat((100*(1-(length(CIU_sim)/nSims+length(CIL_sim)/nSims))),"% of the 95% confidence intervals contained the true mean")
cat((100*(1-(length(RCIU_sim)/nSims+length(RCIL_sim)/nSims))),"% of the 95% replication capture intervals contained the observed mean")

#Calculate how many times the simulated mean fell within the 95% CI of the original study
mean_sim1<-mean_sim[mean_sim>CIL&mean_simRCIL&mean_sim<RCIU]
cat("The RCI capture percentage for the plotted study, or the % of means from other simulations within the observed 95% replication capture interval from",RCIL,"to",RCIU,"is:",100*length(mean_sim2)/nSims,"%")

#What proportion ofmany times did one simulation capture another within RCI
cat(100*(capture-nSims)/(i*(j-1)),"% of pairwise replication captures were successful from simulated 95% RCIs")

1. Not sure if you (Mike Atiken) are still reading this, but the code breaks for me from this point on.

RCIU_sim<-RCIU_sim[RCIU_simsamplemean]

(There isn't an object RCIU_simsamplemean, or an object mean_simRCIL or an object mean_sim2)

4. Apologies - didn't mean to post previous anonymously!

5. How is this different from a prediction interval versus a confidence interval (as is often discussed in regression)? Rob Hyndman has a post on the this (http://robjhyndman.com/hyndsight/intervals/)

1. Predictions intervals typically are for a single new observation. Are much wider than confidence intervals.

2. Well, I seem to recall that prediction intervals can be used for future sample means (of independent samples), too, treating the future sample mean as an observable. The usual multiplier becomes sqrt(1/n_new + 1/n_old) rather than sqrt(1 + 1/n_old). So, loosely, a 95% CI is really about an 84% PI.

Seymour Geisser wrote a book on this.

3. A bit late, but I came across a direct reference today:

* Kalbfleisch (1975, 1989) Probability and Statistical Inference II Example 16.3.1

6. I do not know whether there is an issue with the simulation, but with a large sample size (n =1000) and repeating the simulation 100 times, I've found the capture percentage is higher than 84% (I've found 92%)

@AntoViral (do not know how to sign, I have'nt a URL)

This is the code, adapted from yours:

library(ggplot2)

### creation of an empty dataframe

data <- data.frame()

N <- 100

for (i in 1:N) {

### original
## n=20 #set sample size
## nSims<-100000 #set number of simulations

### modified by me

set.seed(i) ### set seed for reproducibility

n=1000 #set sample size
nSims<-1000 #set number of simulations

x<-rnorm(n = n, mean = 100, sd = 15) #create sample from normal distribution

#95%CI
CIU<-mean(x)+qt(0.975, df = n-1)*sd(x)*sqrt(1/n)
CIL<-mean(x)-qt(0.975, df = n-1)*sd(x)*sqrt(1/n)

#plot data
#png(file="CI_mean.png",width=2000,height=2000, res = 300)
ggplot(as.data.frame(x), aes(x)) +
geom_rect(aes(xmin=CIL, xmax=CIU, ymin=0, ymax=Inf), fill="#E69F00") +
geom_histogram(colour="black", fill="grey", aes(y=..density..), binwidth=2) +
xlab("IQ") + ylab("number of people") + ggtitle("Data") + theme_bw(base_size=20) +
theme(panel.grid.major.x = element_blank(), axis.text.y = element_blank(), panel.grid.minor.x = element_blank()) +
geom_vline(xintercept=100, colour="black", linetype="dashed", size=1) +
coord_cartesian(xlim=c(50,150)) + scale_x_continuous(breaks=c(50,60,70,80,90,100,110,120,130,140,150)) +
annotate("text", x = mean(x), y = 0.02, label = paste("Mean = ",round(mean(x)),"\n","SD = ",round(sd(x)),sep=""), size=6.5)
#dev.off()

#Simulate Confidence Intervals
CIU_sim<-numeric(nSims)
CIL_sim<-numeric(nSims)
mean_sim<-numeric(nSims)

for(i in 1:nSims){ #for each simulated experiment
x<-rnorm(n = n, mean = 100, sd = 15) #create sample from normal distribution
CIU_sim[i]<-mean(x)+qt(0.975, df = n-1)*sd(x)*sqrt(1/n)
CIL_sim[i]<-mean(x)-qt(0.975, df = n-1)*sd(x)*sqrt(1/n)
mean_sim[i]<-mean(x) #store means of each sample
}

#Save only those simulations where the true value was inside the 95% CI
CIU_sim<-CIU_sim[CIU_sim<100]
CIL_sim<-CIL_sim[CIL_sim>100]

# cat((100*(1-(length(CIU_sim)/nSims+length(CIL_sim)/nSims))),"% of the 95% confidence intervals contained the true mean")

#Calculate how many times the observed mean fell within the 95% CI of the original study
mean_sim<-mean_sim[mean_sim>CIL&mean_sim<CIU]
# cat("The capture percentage for the plotted study, or the % of values within the observed confidence interval from",CIL,"to",CIU,"is:",100*length(mean_sim)/nSims,"%")

conf <- (100*(1-(length(CIU_sim)/nSims+length(CIL_sim)/nSims)))
capt <- 100*length(mean_sim)/nSims

### collect the data in a dataframe

data <- rbind(data, c(conf, capt))
names(data) <- c("95% CI", "Capture %")

}

### check the result

cap <- ifelse(data[,2]<94.9, 1, 0)

plot(data,pch=19)
mtext(paste0("95% confidence intervals have a ", sum(cap), "% capture percentage"))

1. colMeans(data)

and see it's 83.4 - ON AVERAGE

7. Hi Daniel, quick question. All of the discussion around CI's has focused on population data. I'm wondering about the implications for individual data (such as neuropsych assessment).

As a neuropsych I was trained that the 95% CI provides a range about which we can be 95% confident contains the individuals 'true' score. But is this actually the case??? Is a more accurate interpretation that if we tested the patient over and over again (not accounting for practice effects) that their score would fall within the 95% CI,95% of the time...??

1. Hi, no, that is incorrect. It is often taught incorrectly. Confidence intervals are counterintuitive things.

2. Okay, do you mind explaining the application of CI's in this context (please)?

3. There is no special application - CI are always what they are, as explained above. It sounds like they are misused - but I can't explain that.

8. That's a great post but I think it misses the point that not only are statements like the one being critiqued incorrect but they are caring about the wrong thing. They're like early astronomy where the earth is the centre of the universe. The researcher is thinking in terms of their mean and CI as the centre of the universe. Accepting what the CI really means and what a proper statement about it is allows one to be correct 95% of the time. So, after the calculations the critical method is that making your CI the centre of discussion you've reduced your long run accuracy of statements dramatically and further reduced the useful relevance of your study.

9. This comment has been removed by a blog administrator.