*Power analyses require accurate estimates of the standard deviation. In this blog, I explain how to calculate confidence intervals around standard deviation estimates obtained from a sample, and show how much sample sizes in an a-priori power analysis can differ based on variation in estimates of the standard deviation.*

If we calculate a standard deviation from a sample, this value is an estimate of the true value in the population. In small samples, our estimate can be quite far off, while due to the law of large numbers, as our sample size increases, we will be measuring the standard deviation more accurately. Since the sample standard deviation is an estimate with uncertainty, we can calculate a 95% confidence interval around it.

Expressing the uncertainty in our estimate of the standard deviation can be useful. When researchers plan to simulate data, or perform an a-priori power analysis, they need accurate estimates of the standard deviation. For simulations, the standard deviation needs to be accurate because we want to generate data that will look like the real data we will eventually collect. For power analyses we often want to think about the smallest effect size of interest, which can be specified as the difference in means you care about. To perform a power analysis we also need to specify the expected standard deviation of the data. Sometimes researchers will use pilot data to get an estimate of the standard deviation. Since the estimate of the population standard deviation based on a pilot study has some uncertainty, the width of confidence intervals around the standard deviation might be a useful way to show how much variability one can expect.

Below is the R code to calculate the confidence interval around a standard deviation from a sample, but you can also use this free GraphPad calculator. The R code then calculates an effect size based on a smallest effect size of interest of half a scale point (0.5) for a scale that has a true standard deviation of 1. The 95% confidence interval for the standard deviation based on a sample of 100 observation ranges from 0.878 to 1.162. If we draw a sample of 100 observations and happen to observe a value on the lower or upper bound of the 95% CI the effect size we calculate will be a Cohen’s d of 0.5/0.878 = 0.57 or 0.5/1.162 = 0.43. This is quite a difference in the effect size we might use for a power calculation. If we enter these effect size estimates in an a-priori power analysis where we aim to get 90% power using an alpha of 0.05 we will estimate that we need either 66 participants in each group, or 115 participants in each group.

It is clear sample sizes from a-priori power anayses depend strongly on an accurate estimate of the standard deviation. Keep into account that

**estimates of the standard deviation have uncertainty**. Use**validated or existing measures for which accurate estimates of the standard deviation in your population of interest are available**, so that you can rely on a better estimate of the standard deviation in power analyses.Some people argue that

**if you have such a limited understanding of the measures you are using that you do not even know the standard deviation of the measure in your population of interest, you are not ready to use that measure to test a hypothesis.**If you are doing a power analysis but realize you have no idea what the standard deviation is, maybe you first need to spend more time validating the measures you are using.When performing simulations or power analyses the same cautionary note can be made for estimates of correlations between dependent variables. When you are estimating these values from a sample, and want to perform simulations and/or power analyses, be aware that all estimates have some uncertainty. Try to get as accurate estimates as possible from pre-existing data. If possible, be a bit more conservative in sample size calculations based on estimated parameters, just to be sure.

i'd add that planned missingness is a great way to have sufficient power given limited resources! planned missing data designs (PMDD) + FIML estimation can lead to very similar results & conclusions - assuming missingness is planned to be (completely) at random

ReplyDelete