Yesterday Mike McCullough posted an interesting question on Twitter. He had collected some data, observed a p = 0.026 for his hypothesis, but he wasn't happy. Being aware that higher p-values do not always provide strong support for H1, he wanted to set a new goal and collect more data. With sequential analyses it's no problem to look at data (at least when you plan them ahead of time), and collect additional observations (because you control the false positive rate) so the question was: which goal should you have?
Mike suggested a p-value of 0.01, which (especially with increasing sample sizes) is a good target. But others quickly suggested forgetting about those damned p-values altogether, and plan for accuracy. Planning for accuracy is simple: you decide upon the width of the confidence interval you'd like, and determine the sample size you need.
I don't really understand why people are pretending like these two choices are any different. They always boil down to the same thing: your sample size. A higher sample size will give you more power, and thus a better chance of observing a p-value of 0.01, or 0.001. A higher sample will also reduce the width of your confidence interval.
So the only difference is which calculation you use to base your sample size on. You either decide upon an effect size you expect, or a width of a confidence you desire, and calculate the sample size. One criticism against power analysis is that you often don't know the effect size (e.g., Maxwell, Kelley, & Rausch, 2008). But with sequential analyses (e.g., Lakens, 2014) you can simple collect some data, and calculate conditional power based on the observed effect size for the remaining sample.
I think a bigger problem is that people have no clue whatsoever when determining an appropriate width for a confidence interval. I've argued before that people have a much better feel for p-values than confidence intervals.
In the graph below, you see 60 one-sided t-test, all examining a true effect with a mean difference of 0.3 (dark vertical line) with a SD of 1. The bottom 20 are based on a sample size of 118, the middle 20 on a sample size of 167, and the top 20 on a sample size of 238. This gives you 90% power for a p=0.05, p=0.01, and p=0.001, respectively. Not surprisingly, as power increases, less confidence intervals include 0 (i.e., are significant). The higher the sample size, the further the confidence intervals stay away from 0.
Take a look at the width of the confidence intervals. Can you see the differences? Do you feel the difference in aiming for a width of the confidence interval of 0.40, 0.30, or 0.25 (more or less the width in the three groups from bottom to top)? If not, but you do have feel for the difference between aiming for p = 0.01 or p=0.001, then go for the conditional power analysis. I would.
R script that produced the figure above: