Comments on The 20% Statistician: Always use Welch's t-test instead of Student's t-test

Its SatterTHwaite https://en.wikipedia.org/wiki/W...

2023-11-04T12:53:00.816+01:00

Its SatterTHwaite
https://en.wikipedia.org/wiki/Welch%E2%80%93Satterthwaite_equation

Thanks so much for the informative blog.

2021-09-07T06:13:04.805+02:00

Thanks so much for the informative blog.

2021-07-08T12:59:30.607+02:00

This comment has been removed by a blog administrator.

2020-12-16T08:29:45.875+01:00

This comment has been removed by a blog administrator.

There's also a Welch's version of ANOVA. T...

2020-05-12T23:54:56.072+02:00

There's also a Welch's version of ANOVA. This blog post provides a short discussion: https://statisticsbyjim.com/anova/welchs-anova-compared-to-classic-one-way-anova/

OK, now I'm getting nervous. The F statistic ...

2020-04-01T00:33:25.827+02:00

OK, now I'm getting nervous. The F statistic in one-way ANOVA uses the MSE as its denominator, and that's just pooled variance on steroids. Should we re-think the F test with more carefully designed factors?

How to kiss the other party the most impressed. ...

2017-10-20T13:36:19.052+02:00

How to kiss the other party the most impressed.
คาสิโนออนไลน์ Betting Online Betting Sites It is part of the online players. Get into online betting. It is a popular game of all kinds of gambling games. It very good Can receive news More news. Love show To impress people who love themselves. Whether it is a woman or a man. There are always different ways to express love. Some people choose to say "love" to express their feelings to the other party to know. But some people choose to act in body language, express themselves more deeply and show more love.

Gently kisses the forehead for bright love. I used to say fans. Everything in life is happy and pink. The man kisses her forehead gently. It shows the love and affection he has for women. Women will feel warmth and safest. For the first kiss should show a love that is not very much. Most of the love is more expressive.

Kissing is a way of expressing love without mistakes. But do not be blatant or do it in a crowded place. It should be done in a private place where only you and your loved ones. Making an impression of kissing is not difficult. It is believed that the power of love, whether it is a kiss, it is always happy for online casinos. The exciting game has given us a chance to diversify our services. Interesting game It is another game that will advance to the online betting service that has continuous betting. Games to Employees to Thai Society of Games คาสิโนออนไลน์

Hey, out of curiosity, what about in cases where y...

2017-03-22T17:14:14.818+01:00

Hey, out of curiosity, what about in cases where you are using an ANOVA with either one or multiple predictors?

Thank you for this valuable information, it is rea...

2017-02-01T14:39:16.649+01:00

Thank you for this valuable information, it is really useful.

2016-05-28T11:25:08.590+02:00

This comment has been removed by a blog administrator.

Yes! Statistics posts with references and simulati...

2016-05-09T16:21:48.982+02:00

Yes! Statistics posts with references and simulations! Thanks!

In you example, the data is normal, the variances are different but the means are the same. For my data on 4 or 6 different genotype groups, the data is not normal, the mean and variances are different between groups. I have chosen to log transform my data then perform Welch's test followed by Games-Howell post hoc test. Is it correct to transform data in such a way before carrying out a welch test or can you not say without more information on the dataset?

It is likely that my study has further problems, as the types of genetic crosses we do we give 4 or 6 genotypes where 1/8th of a population has the mutant genotype we wished to study, so low n and varied sample sizes are inevitable. As you say 'Student's t-test is more powerful when variances and sample sizes are unequal and the larger group has the smaller variance' but it affects the type I error rate, I am confused as to what whether Welch's would be the best option for me in such a case. Would you recommend a non-parametric test such as Kruskal-Wallis instead? But Kruskal-Wallis assumes the same shaped distrubution as far as I know, so would again not be correct for my data I beleive.

I was a bit worried about having to explain Welch's test in my Viva to an older generation of scientists, especially as a young medical statician had no idea what I was talking about recently. But I understand it much better now despite still having confusions about my own data!

Thanks

Dee

At first it seems to be quite a difficult story wh...

2015-12-21T12:25:23.475+01:00

At first it seems to be quite a difficult story which will help in solving my problem but you are really very good things to clear the concepts and also it seems to be quite easy to understand now. how to run a manova in spss

2015-12-21T11:08:48.884+01:00

This comment has been removed by the author.

Depends on your H0. It's true if your H0 says ...

2015-01-29T18:29:46.403+01:00

Depends on your H0. It's true if your H0 says 'mu 1 = mu 2' (i.e. two populations with the same mean), not if it says 'x1. and x2. are drawn from the same population'. If it's the latter H0 you're interested in (a decent choice given randomisation), you could actually make the case that the t-test (as well as the Mann-Whitney or permutation tests) outperforms the Welch test in detecting that the two populations are indeed different.

2015-01-29T18:00:17.651+01:00

This comment has been removed by the author.

I think these are good questions, that require dat...

2015-01-29T17:32:33.131+01:00

I think these are good questions, that require data. Setting differences between means to 0, but assuming differences in variance is the only way to examine the Type 1 error rate of a test, but does it happen in practice. I think it might, depending on the field you work with, but it's really an empirical question.

"The R code below examines the Type 1 error r...

2015-01-29T17:13:12.054+01:00

"The R code below examines the Type 1 error rate of a hypothetical study where 38 participants were assigned to condition X, and 22 participants were assigned to condition Y. The mean score on some DV in both groups is the same (e.g., 0), so there is no effect, but the standard deviations between groups differ, with the SD in condition X being 1.11, and the SD in condition Y being 1.84."

What kind of experimental manipulation would lead to identical means but affect variance? And wouldn't the upshot of such a randomised experiment have to be that there was an effect - just not in terms of the mean.

I wholeheartedly agree that Levene's test (or test for normality or covariate balance for that matter) are overused in randomised experiments, and I also think that the Welch test is a better default than the normal t test in non-randomised experiments.
But doesn't randomisation allow us to boldly assume that 'null hypothesis = no effect' comprises both 'no mean shift' and> 'no change in variance', i.e. the assumption that both groups were drawn from the same population (whatever it may be)? (And use permutation tests while we're at it.)

1-Thanks for answering :) i think I'm with you...

2015-01-28T01:15:06.287+01:00

1-Thanks for answering :) i think I'm with you on the glass delta, and I'll be interested to hear about different robust effect size measures when you get around to writing about them.

3-But what do we do if they disagree again once we replicate? Replicate again? But it's true they agree most of the time!

4-that's true, and it's worthwhile ;)

Daniel; thanks for your extensive answers.

2015-01-27T17:31:33.356+01:00

Daniel; thanks for your extensive answers.

Hi Joost, I provide one example, and then referen...

2015-01-27T17:23:25.466+01:00

Hi Joost,

I provide one example, and then reference an extensive literature that has examined this issue in detail, with a vast amount of different values, in hundreds of simulations, This is not my idea, it is not debated, I'm just explaining it. The references are there, so if you care enough about this topic to run simulations, read the literature I am summarizing.

There is a very specific set of effect size/sample size combination where Student's t-test has a little (but not enough to make it worthwhile) more power. This is discussed in the literature, and since you cannot be certain you have equal variances in tiny samples, you should always report Welch's t-test if you do science in the real world. I honestly don't care which specific combination you an come up with while running simulations in R where power values are a tiny bit in the advntage of Student's t.

Your latest example has pointed out a situation where I need to add 2 participants to the smallest group to compensate for the difference in power, and you are already entering the domain of underpowered studies (which I hope you are not recommending as good practice). If you want to perform even more underpowered studies, you can boost the difference between Student's t-test and Welch's t-test even more. But this all does not change the very simple fact that you should always report Welch's t-test (or, if data are not normal, robust statistics, such as Yuen's method, an adaptation on Welch's method using trimmed means and windsorized variances - but I'm leaving that for a future post).

Dear Daniel, 1) One could try other combinations ...

2015-01-27T16:56:36.712+01:00

Dear Daniel,

1) One could try other combinations using larger sample sizes (e.g., n1=100, n2=10, sd1=sd2=1), and this also shows that the t test can have a considerable power advantage (85% vs. 78% for a mean shift of 1). Anyway, I am unclear why you are saying that (n1=30, n2=5) is less informative than (n1=38, n2=22). Sometimes researchers are facing small sample sizes (and large effects). William Gosset developed the t test exactly because he was wanted to obtain valid results for small samples.

2) What I meant is NOT that unequal variances are rare IF you have normal distributions. What I meant is that unequal variances are usually accompanied by non-normal distributions. That is, unequal variances usually arise for some reason, such as a floor or ceiling measurement artefact. It would be interesting to explore how the Welch and t test compare in situations other than perfectly normal distributions.
For example, suppose one assumes a 5-point Likert scale, with the following density distribution: Totally disagree = 0%, Slightly disagree = 1%, Neutral = 3%, Slightly agree = 6%, Strongly agree = 90%, that is a highly skewed distribution. Also assume n1=22, n2=100. Now sample the two vectors from the same population. The Type I error rate is now 13% for the Welch test, and 4% for the t test. Of course, this is an extreme situation, but it nicely illustrates that the Welch test can break down completely if the distributional assumptions are violated.

3) I am also not sure whether the burden of proof should be on me now, while you are the one claiming that the Welch test should “always” be used. I just think you are making quite a generalization by saying that the Welch test is always preferred, because your conclusion seems to be based on 1 simulation using 1 set of parameters and 1 type of distribution. Counterexamples are easily found….

Cheers, Joost

Finally, in your example, as in the previous examp...

2015-01-27T11:21:40.496+01:00

Finally, in your example, as in the previous example, the power difference is mitigated by running one additional participant (so 6 instead of 5). Unless you have something with practical relevence to report, I'm sticking with my recommendation to alway report Welch's t-test.

Also, please provide those empirical references fo...

2015-01-27T11:09:53.086+01:00

Also, please provide those empirical references for your earlier statement that unequal variances are rare in normal distributions. Thanks.

Joost, you do seem to love ridiculously small samp...

2015-01-27T11:08:31.092+01:00

Joost, you do seem to love ridiculously small sample sies despite my previous sttement that doing tatistics on such samples is not telling you more than flipping a coin. A type 1 error rate of 3% when you want it to be 5% means the test is performing poorly - so your example is demonstrating that Welch's test is performing better, but just don't realize it.

Dear Daniel, It is possible to devise more counte...

2015-01-27T09:55:18.943+01:00

Dear Daniel,

It is possible to devise more counterexamples, even with normal distributions.

For example, with n1 = 30, n2 = 5, sd1 = 1.2, sd2 = 1, I get a Type II error rate of 5% for the t test and 10% for the Welch test (for detecting a mean shift of 2). The Type I error rate is 3% for the t test and 6% for the Welch test. So we now have a situation where the Welch t test performs twice as poorly as the t test regarding both power and false positives.

Best regards, Joost