Comments on The 20% Statistician: How a power analysis implicitly reveals the smallest effect size you care about

2020-12-16T11:06:50.927+01:00

This comment has been removed by a blog administrator.

Re: "you are not interested in effect sizes t...

2017-09-13T22:09:18.259+02:00

Re: "you are not interested in effect sizes that will never be statistically significant"

I would add "...within a time frame that is credible" i.e. findings are most likely to be significant while power is low, well before the "estimated" duration, but these are less likely to reproduce.

Anyway, I had a similar thought, I think. I opted to address uncertainty about the effect size by showing a range of effect sizes. I think ranges are more intuitive than single numbers, esp. when the likely effect size is hard to estimate. The idea is: I first see some reasonable proposal as to how long and what I can detect, then I adjust duration until the range feels achievable given what is being tested. In online analytics, it works, because duration can be extended indefinitely, but usually quick decision are needed. If my duration fixed, I can gauge by best case scenario.

Here's the reverse sample size / effect size calculator I came up with: http://vladmalik.com/abstats

On top of visualizing power, I also wanted to see the impact on false positive rate in case my hypothesis is wrong e.g., I could gauge if the results I am seeing are within the range predicted by pure chance.

I've no formal training in stats, so I was always curious if my approach to this has merit in other real-life scenarios. Glad to see you're doing something somewhat similar. Would love to hear your thoughts too.

thanks for sharing the nice idea. หนังผีฝรั่ง

2017-08-03T18:57:08.446+02:00

thanks for sharing the nice idea.

หนังผีฝรั่ง

I think there is still a typo (or words missing) i...

2017-07-04T01:32:24.372+02:00

I think there is still a typo (or words missing) in that passage: "Simonsohn (2015) suggested to set the smallest effect size of interest to 33% of the effect size in the original study could detect."

Thanks Daniel! Do you have any recomendations to t...

2017-06-13T18:18:03.038+02:00

Thanks Daniel! Do you have any recomendations to test equivalence in non-normal data?

It's the same idea, just different calculation...

2017-06-13T12:14:02.116+02:00

It's the same idea, just different calculations.

Dear Daniel, Thank you for your post! I'm a b...

2017-06-13T12:03:08.641+02:00

Dear Daniel,

Thank you for your post! I'm a beginner trying to understand power analysis and equivalence test/SESOI calculation. One of your assumptions is that your data has a normal distribution. What it would be your approach if not?

I don't see a lot of benefit of incorporating ...

2017-06-10T09:32:03.807+02:00

I don't see a lot of benefit of incorporating prior information in inferences in the next decades, given the current state of knowledge. Can you point me to KEn Kelley's loss functions? I could not find any information in MBESS or on his website, but it is a topic I actively plan to work on in the next years so I'd appreciate any info you can share.

Dear Daniel, That's what good methodologists ...

2017-06-10T01:51:13.772+02:00

Dear Daniel,

That's what good methodologists such as yourself are for, right! Promoting good stuff! Ken Kelly, for example, has implemented loss functions for use in a wide range of power-analytic situations in his package. I'm mathematical statistician and really understand some of the limitations in human research. I also mentioned in passing that other types of error for small-sized research such as the ones in psychology are to be seriously heeded when doing power-analysis in the traditional sense. I believe Andrew Gelman has a math-free paper on this topic which is publicly available. Anyway, the point was that, a "Cohen's d" sampling density is simply a "location-scale" version of t-distribution. You make good comments that prepare your colleagues to adopt a bayesian approach to the estimation of the parameters of their interest. Best of luck with your work. Keep it up!

Hi, loss functions are what we need to work toward...

2017-06-09T22:37:12.922+02:00

Hi, loss functions are what we need to work towards. But our field has no clue where to start - this post shows how we can bootstrap a SESOI, and in 10 years, maybe end up with loss functions. If you can give me 20 examples of adequately developed loss functions in psychology, that would be great. If you can't (and you can't) it proves the point of my post.

Dear Daniel, Thanks for sharing your thoughts. Do...

2017-06-06T19:02:47.535+02:00

Dear Daniel,

Thanks for sharing your thoughts. Doing power analysis using effect size, as its metric, does not really alleviate the problem you're thoughtfully raising, namely finding the smallest effect of interest. Because effect size is a simple transformation of other summary statistics (e.g., test statistics). As such, one can simply convert a critical test-statistic value to a corresponding critical effect size value. Such context-free, hypothesis-based view of power-analysis is both old and impractical. Plotting power against effect sizes is useful in conveying the message that the expression "Power of the test = some number" is basically not noteworthy. At a larger level, these revelations are instead important in moving the social and behavioral research toward thinking in terms of Bayesian estimation of effect sizes. If you're interested in frequentist power-analysis, much better ways of doing such power analyses is available via loss functions (frequentist decision theoretic approaches). The traditional power-analytic approach you discuss here has criticisms that take more space, but in short other type of errors than type I and type II are to be involved in the power analysis process.

I am always searching online for articles that can...

2017-05-29T07:59:47.791+02:00

I am always searching online for articles that can help me. There is obviously a lot to know about this.
gclub
gclub casino
gclub

Since you asked about where it's equivocal, be...

2017-05-13T06:45:15.454+02:00

Since you asked about where it's equivocal, between pop ES and sample ES, here's one: you say "true effect size is 0 (or the null-hypothesis is true), and when d = 0.5."
Here d = .5 appears to speak of the pop ES. On your graph it's the observed.
Another: Your first figure shows d = .5 & also that d = .3, the first I take it is a pop, the second a sample.

A separate issue I have with using these standardized pop d's is that it seems you're allowed to do the analysis without knowing the standard deviation. Is that so?

Should you ever run out of ideas for blog posts, I...

2017-05-12T09:26:38.344+02:00

Should you ever run out of ideas for blog posts, I think one where you detail how you or your collaborators arrived at an effect size or SESOI would make for interesting reading. My sense is that power analyses are often based on canned effect sizes with little regard to the specifics of the study (theory and design), so it would be useful to see some more sophisticated approaches to specifying ESs.

I find this equivocating between observed ES and p...

2017-05-12T02:07:44.306+02:00

I find this equivocating between observed ES and population ES. This is very common in psych, and it would really help if you labelled which you have in mind whenever used. Cohen had a subscript s for the observed ES. (I use difference for the observed, and discrepancy for the parametric effect size).
To take the simple one-sample test of a Normal mean : Ho: mu< 0 vs H1: mu > 0, the cut-off for rejection at the 025 level is a sample mean M of 1.96SE. Are you saying the pop effect size of interest is this cut-off, 1.96 SE? That would be to take, as the pop ES of interest, one against which the test has 50% power. I'm not saying that would be bad, I'm just trying to figure out your equivocal use of effect size.

Daniel, I'm a biostatistician, but I occasiona...

2017-05-12T01:06:10.198+02:00

Daniel, I'm a biostatistician, but I occasionally consult for social scientists. I disagree that 99% power is inefficient. We are interested not just in detecting an effect, but in obtaining a reasonably precise estimate of the effect size. The flip side of high power is narrow confidence intervals.

As to sequential analysis, I agree that in many psych experiments the approach is useful. However, sequential analysis would not be practical in most studies I've been involved with. For example, if we need patients to be under treatment for three months, then, for many logistical reasons as well as financial ones, we really need the study to terminate after three months.

JT - it's good you don't work at our depar...

2017-05-11T23:46:34.519+02:00

JT - it's good you don't work at our department! Our ethics department would not be easily convinced by designing studies with 99% power - it's wasteful, and our resources can be spent more efficiently! You should really do sequential analyses (see Lakens, 2014, for an introduction (you are anonymous, so I don't know what you know about stats, but if you've never learned about sequential analyses, you should!).

Jan, although your last paragraph was presumably a...

2017-05-11T23:39:00.106+02:00

Jan, although your last paragraph was presumably aimed at Daniel, rather than me, when applying for funding, I always base my proposed sample size on having 90% power to detect the smallest effect size of interest. This usually winds up giving me 99% power or more to detect the hypothesized true effect size.

Hi - current practice in power analysis is not ver...

2017-05-11T23:33:45.235+02:00

Hi - current practice in power analysis is not very state of the art in psychology, and the problem is, smallest effect sizes of interest do not exist, are almost never used or specified. That's why I wrote this post - to bootstrap a SESOI, building from a practice people use (power analysis).

We've had this practice of 90% power for about 2 years. I could give examples - very often people in our group specify a SESOI, or they look at a pilot study, and then use a more conservative estimate in a power analysis. If there is large uncertainty, we recommend sequential analyses.

Thanks! Changed (and I knew that - last minute add...

2017-05-11T23:11:47.640+02:00

Thanks! Changed (and I knew that - last minute addition I didn't think through! Thanks for correcting me!).

Hi Daniël! Interesting post. Just a detail: I thin...

2017-05-11T22:50:01.420+02:00

Hi Daniël! Interesting post. Just a detail: I think Simonsohn (2015) did not suggest to set the smallest effect size of interest to 33% of the effect size in the original study, as you write. He suggested to set the smallest effect size of interest so that the original experiment had 33% power to reject the null if this ES was true. This smallest ES of interest thus does not depend on the found effect size of the original study: it only depends on the sample size. For instance, for n=20 per cell in a two cells design, the effect size would be d=0.5, because this gives 33% power. Your approach is that the smallest ES is the effect size that gives 50% power in the original study. It makes a difference, but I think your approach is, in the end, quite close to Simonsohn's approach.

That's what I was thinking - the d = 0.3, i.e....

2017-05-11T21:28:52.755+02:00

That's what I was thinking - the d = 0.3, i.e. what you call the smallest effect size of interest, is just the sample effect size, whereas the d = 0.5 you based the power calculation on is a postulated population effect size. If you have a smallest effect size of interest, wouldn't you want to treat it as a postulated population effect size and base your power computation on that?

Incidentally, your departmental policy sounds interesting (swing for 90% power). Do you have any worked out examples, i.e., of your colleagues identifying the effect size that they're investigating etc.?

If experimental psychologists are basing their sam...

2017-05-11T20:31:24.438+02:00

If experimental psychologists are basing their sample size requirements on the effect size they expect to observe, then they are making a mistake, because, for one thing, their experiment will be underpowered to detect a smaller effect size that they would still consider scientifically of interest. Sample size planning should always be based on the smallest effect size of interest.