tag:blogger.com,1999:blog-987850932434001559.post3297976599492795506..comments2024-03-29T05:57:12.346+01:00Comments on The 20% Statistician: The difference between a confidence interval and a capture percentageDaniel Lakenshttp://www.blogger.com/profile/18143834258497875354noreply@blogger.comBlogger17125tag:blogger.com,1999:blog-987850932434001559.post-2483134980981636092021-04-07T16:26:40.911+02:002021-04-07T16:26:40.911+02:00This comment has been removed by a blog administrator.Time Hub Zonehttps://www.blogger.com/profile/09092885715599076248noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-72649662580066843862016-07-19T04:26:11.676+02:002016-07-19T04:26:11.676+02:00That's a great post but I think it misses the ...That's a great post but I think it misses the point that not only are statements like the one being critiqued incorrect but they are caring about the wrong thing. They're like early astronomy where the earth is the centre of the universe. The researcher is thinking in terms of their mean and CI as the centre of the universe. Accepting what the CI really means and what a proper statement about it is allows one to be correct 95% of the time. So, after the calculations the critical method is that making your CI the centre of discussion you've reduced your long run accuracy of statements dramatically and further reduced the useful relevance of your study.Unknownhttps://www.blogger.com/profile/00227235335343168838noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-28800035065517386182016-04-23T13:45:07.973+02:002016-04-23T13:45:07.973+02:00Not sure if you (Mike Atiken) are still reading th...Not sure if you (Mike Atiken) are still reading this, but the code breaks for me from this point on. <br /><br />RCIU_sim<-RCIU_sim[RCIU_simsamplemean]<br /><br />(There isn't an object RCIU_simsamplemean, or an object mean_simRCIL or an object mean_sim2)Joenoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-73676109100795450412016-04-01T06:40:19.518+02:002016-04-01T06:40:19.518+02:00There is no special application - CI are always wh...There is no special application - CI are always what they are, as explained above. It sounds like they are misused - but I can't explain that. Daniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-16304346597440006582016-03-31T23:04:25.809+02:002016-03-31T23:04:25.809+02:00Okay, do you mind explaining the application of CI...Okay, do you mind explaining the application of CI's in this context (please)?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-44304493529157110452016-03-31T17:29:48.237+02:002016-03-31T17:29:48.237+02:00Hi, no, that is incorrect. It is often taught inco...Hi, no, that is incorrect. It is often taught incorrectly. Confidence intervals are counterintuitive things.Daniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-1014575609664509242016-03-31T10:35:47.851+02:002016-03-31T10:35:47.851+02:00Hi Daniel, quick question. All of the discussion a...Hi Daniel, quick question. All of the discussion around CI's has focused on population data. I'm wondering about the implications for individual data (such as neuropsych assessment). <br /><br />As a neuropsych I was trained that the 95% CI provides a range about which we can be 95% confident contains the individuals 'true' score. But is this actually the case??? Is a more accurate interpretation that if we tested the patient over and over again (not accounting for practice effects) that their score would fall within the 95% CI,95% of the time...??Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-57874465359632482692016-03-22T00:10:08.507+01:002016-03-22T00:10:08.507+01:00A bit late, but I came across a direct reference t...A bit late, but I came across a direct reference today:<br /><br /> * Kalbfleisch (1975, 1989) Probability and Statistical Inference II Example 16.3.1Bill Rhttps://www.blogger.com/profile/16756950826166022532noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-32769707684457291362016-03-06T19:36:27.224+01:002016-03-06T19:36:27.224+01:00Well, I seem to recall that prediction intervals c...Well, I seem to recall that prediction intervals can be used for future sample means (of independent samples), too, treating the future sample mean as an observable. The usual multiplier becomes sqrt(1/n_new + 1/n_old) rather than sqrt(1 + 1/n_old). So, loosely, a 95% CI is really about an 84% PI.<br /><br />Seymour Geisser wrote a book on this. Bill Rhttps://www.blogger.com/profile/16756950826166022532noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-54254620100643695952016-03-05T22:55:49.410+01:002016-03-05T22:55:49.410+01:00Sure. Normal. Add
colMeans(data)
and see it...Sure. Normal. Add <br /><br />colMeans(data) <br /><br />and see it's 83.4 - ON AVERAGEDaniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-18506090304307299862016-03-05T22:34:43.425+01:002016-03-05T22:34:43.425+01:00I do not know whether there is an issue with the s...I do not know whether there is an issue with the simulation, but with a large sample size (n =1000) and repeating the simulation 100 times, I've found the capture percentage is higher than 84% (I've found 92%)<br /><br />@AntoViral (do not know how to sign, I have'nt a URL)<br /><br />This is the code, adapted from yours:<br /><br />library(ggplot2)<br /><br />### creation of an empty dataframe <br /><br />data <- data.frame()<br /><br />N <- 100<br /><br />for (i in 1:N) { <br /><br />### original<br />## n=20 #set sample size<br />## nSims<-100000 #set number of simulations<br /><br />### modified by me<br /><br />set.seed(i) ### set seed for reproducibility<br /><br />n=1000 #set sample size<br />nSims<-1000 #set number of simulations<br /><br />x<-rnorm(n = n, mean = 100, sd = 15) #create sample from normal distribution<br /><br />#95%CI<br />CIU<-mean(x)+qt(0.975, df = n-1)*sd(x)*sqrt(1/n)<br />CIL<-mean(x)-qt(0.975, df = n-1)*sd(x)*sqrt(1/n)<br /><br />#plot data<br />#png(file="CI_mean.png",width=2000,height=2000, res = 300)<br />ggplot(as.data.frame(x), aes(x)) + <br /> geom_rect(aes(xmin=CIL, xmax=CIU, ymin=0, ymax=Inf), fill="#E69F00") +<br /> geom_histogram(colour="black", fill="grey", aes(y=..density..), binwidth=2) +<br /> xlab("IQ") + ylab("number of people") + ggtitle("Data") + theme_bw(base_size=20) + <br /> theme(panel.grid.major.x = element_blank(), axis.text.y = element_blank(), panel.grid.minor.x = element_blank()) + <br /> geom_vline(xintercept=100, colour="black", linetype="dashed", size=1) + <br /> coord_cartesian(xlim=c(50,150)) + scale_x_continuous(breaks=c(50,60,70,80,90,100,110,120,130,140,150)) +<br /> annotate("text", x = mean(x), y = 0.02, label = paste("Mean = ",round(mean(x)),"\n","SD = ",round(sd(x)),sep=""), size=6.5)<br />#dev.off()<br /><br />#Simulate Confidence Intervals<br />CIU_sim<-numeric(nSims)<br />CIL_sim<-numeric(nSims)<br />mean_sim<-numeric(nSims)<br /><br />for(i in 1:nSims){ #for each simulated experiment<br /> x<-rnorm(n = n, mean = 100, sd = 15) #create sample from normal distribution<br /> CIU_sim[i]<-mean(x)+qt(0.975, df = n-1)*sd(x)*sqrt(1/n)<br /> CIL_sim[i]<-mean(x)-qt(0.975, df = n-1)*sd(x)*sqrt(1/n)<br /> mean_sim[i]<-mean(x) #store means of each sample<br />}<br /><br />#Save only those simulations where the true value was inside the 95% CI<br />CIU_sim<-CIU_sim[CIU_sim<100]<br />CIL_sim<-CIL_sim[CIL_sim>100]<br /><br /># cat((100*(1-(length(CIU_sim)/nSims+length(CIL_sim)/nSims))),"% of the 95% confidence intervals contained the true mean")<br /><br />#Calculate how many times the observed mean fell within the 95% CI of the original study<br />mean_sim<-mean_sim[mean_sim>CIL&mean_sim<CIU]<br /># cat("The capture percentage for the plotted study, or the % of values within the observed confidence interval from",CIL,"to",CIU,"is:",100*length(mean_sim)/nSims,"%")<br /><br />conf <- (100*(1-(length(CIU_sim)/nSims+length(CIL_sim)/nSims)))<br />capt <- 100*length(mean_sim)/nSims<br /><br />### collect the data in a dataframe<br /><br /> data <- rbind(data, c(conf, capt))<br /> names(data) <- c("95% CI", "Capture %")<br /><br /> }<br /><br />### check the result<br /><br />head(data)<br /><br />cap <- ifelse(data[,2]<94.9, 1, 0)<br /><br />plot(data,pch=19)<br />mtext(paste0("95% confidence intervals have a ", sum(cap), "% capture percentage"))Anonymoushttps://www.blogger.com/profile/06401068208081929045noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-36130474958728569802016-03-04T14:51:23.572+01:002016-03-04T14:51:23.572+01:00Predictions intervals typically are for a single n...Predictions intervals typically are for a single new observation. Are much wider than confidence intervals.Daniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-4085172983154164372016-03-03T19:47:57.231+01:002016-03-03T19:47:57.231+01:00How is this different from a prediction interval v...How is this different from a prediction interval versus a confidence interval (as is often discussed in regression)? Rob Hyndman has a post on the this (http://robjhyndman.com/hyndsight/intervals/)Bill Rhttps://www.blogger.com/profile/16756950826166022532noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-89867014312201404732016-03-03T11:08:12.594+01:002016-03-03T11:08:12.594+01:00Apologies - didn't mean to post previous anony...Apologies - didn't mean to post previous anonymously!Mike Aitkennoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-61200008106416707892016-03-03T11:06:50.197+01:002016-03-03T11:06:50.197+01:00Nice post - although you might want to include the...Nice post - although you might want to include the 95% replication capture intervals (what you should do for this type of inference) as a comparator for the the 95% CI.<br /><br />Hacked into your script below:<br /><br /><br /><br />if(!require(ggplot2)){install.packages('ggplot2')}<br />library(ggplot2)<br /><br />n=30 #set sample size<br />nSims<-1000 #set number of simulations<br /><br />x<-rnorm(n = n, mean = 100, sd = 15) #create sample from normal distribution<br />samplemean <-mean(x)<br />#95%CI<br />CIU<-samplemean+qt(0.975, df = n-1)*sd(x)*sqrt(1/n)<br />CIL<-samplemean-qt(0.975, df = n-1)*sd(x)*sqrt(1/n)<br />RCIU<-samplemean+qt(0.975, df = n-1)*sd(x)*sqrt(2/n)<br />RCIL<-samplemean-qt(0.975, df = n-1)*sd(x)*sqrt(2/n)<br /><br />#plot data<br />#png(file="CI_mean.png",width=2000,height=2000, res = 300)<br />ggplot(as.data.frame(x), aes(x)) + <br /> geom_rect(aes(xmin=CIL, xmax=CIU, ymin=0, ymax=Inf), fill="#E69F00") +<br /> geom_histogram(colour="black", fill="grey", aes(y=..density..), binwidth=2) +<br /> xlab("IQ") + ylab("number of people") + ggtitle("Data") + theme_bw(base_size=20) + <br /> theme(panel.grid.major.x = element_blank(), axis.text.y = element_blank(), panel.grid.minor.x = element_blank()) + <br /> geom_vline(xintercept=100, colour="black", linetype="dashed", size=1) + <br /> coord_cartesian(xlim=c(50,150)) + scale_x_continuous(breaks=c(50,60,70,80,90,100,110,120,130,140,150)) +<br /> annotate("text", x = mean(x), y = 0.02, label = paste("Mean = ",round(mean(x)),"\n","SD = ",round(sd(x)),sep=""), size=6.5)<br />#dev.off()<br /><br />#Simulate Confidence Intervals<br />CIU_sim<-numeric(nSims)<br />CIL_sim<-numeric(nSims)<br />RCIU_sim<-numeric(nSims)<br />RCIL_sim<-numeric(nSims)<br />mean_sim<-numeric(nSims)<br />capture = 0<br />Tcrit = qt(0.975, df = n-1)<br />for(i in 1:nSims){ #for each simulated experiment<br /> x<-rnorm(n = n, mean = 100, sd = 15) #create sample from normal distribution<br /> sim_mean = mean(x)<br /> CIW = Tcrit*sd(x)*sqrt(1/n)<br /> CIU_sim[i]<-sim_mean+CIW<br /> CIL_sim[i]<-sim_mean-CIW<br /> RCIU_sim[i]<-sim_mean+CIW*sqrt(2)<br /> RCIL_sim[i]<-sim_mean-CIW*sqrt(2)<br /> mean_sim[i]<-sim_mean #store means of each sample<br /> for (j in 1:i){<br /> if(mean_sim[i]<=RCIU_sim[j]&&mean_sim[i]>=RCIL_sim[j]){<br /> capture=capture+1<br /> }<br /> if(mean_sim[j]<=RCIU_sim[i]&&mean_sim[j]>=RCIL_sim[i]){<br /> capture=capture+1<br /> }<br /> }<br />}<br /><br /><br />#How many simulations does the true value lie outside the 95% CI<br />CIU_sim<-CIU_sim[CIU_sim<100]<br />CIL_sim<-CIL_sim[CIL_sim>100]<br /><br />#How many simulations does our original observed value lie outside the 95% RCI<br />RCIU_sim<-RCIU_sim[RCIU_simsamplemean]<br /><br />cat((100*(1-(length(CIU_sim)/nSims+length(CIL_sim)/nSims))),"% of the 95% confidence intervals contained the true mean")<br />cat((100*(1-(length(RCIU_sim)/nSims+length(RCIL_sim)/nSims))),"% of the 95% replication capture intervals contained the observed mean")<br /><br />#Calculate how many times the simulated mean fell within the 95% CI of the original study<br />mean_sim1<-mean_sim[mean_sim>CIL&mean_simRCIL&mean_sim<RCIU]<br />cat("The RCI capture percentage for the plotted study, or the % of means from other simulations within the observed 95% replication capture interval from",RCIL,"to",RCIU,"is:",100*length(mean_sim2)/nSims,"%")<br /><br />#What proportion ofmany times did one simulation capture another within RCI<br />cat(100*(capture-nSims)/(i*(j-1)),"% of pairwise replication captures were successful from simulated 95% RCIs")<br />Unconvertedhttps://www.blogger.com/profile/05138741189083170192noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-12027154023083560552016-03-02T21:40:15.576+01:002016-03-02T21:40:15.576+01:00Note how Nate Silver gets this wrong in regard to ...Note how Nate Silver gets this wrong in regard to polling, despite his linking to a correct definition. (Some commentators attempted explanations.)<br />http://errorstatistics.com/2016/02/12/rubbing-off-uncertainty-confidence-and-nate-silver/MAYO:ERRORSTAThttps://www.blogger.com/profile/02967648219914411407noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-25084705243427625322016-03-02T09:45:31.372+01:002016-03-02T09:45:31.372+01:00Nice explanation of capture percentage, clearly di...Nice explanation of capture percentage, clearly differentiating it from coverage percentage. AND, thanks for the link to Magnusson's mesmerizing demo.Anonymousnoreply@blogger.com