Comments on The 20% Statistician: Why Type 1 errors are more important than Type 2 errors (if you care about evidence)

2021-01-28T07:00:53.199+01:00

This comment has been removed by a blog administrator.

My links keep getting screwed up, sorry. Try that...

2017-03-12T21:16:46.160+01:00

My links keep getting screwed up, sorry. Try that again: Shiny and Plotly .

Reposted with correct html below. I made it with ...

2017-03-12T21:12:12.154+01:00

Reposted with correct html below.

I made it with Shiny and Plotly .

I'm a student getting my MS in statistics this spring. I saw your post linked from reddit/r/statistics

Try looking at this on the log-likelihood scale ,...

2017-03-12T21:06:57.156+01:00

Try looking at this on the log-likelihood scale , since we're talking about ratios.

This is great! Who are you? How did you make this?...

2017-03-12T20:58:08.888+01:00

This is great! Who are you? How did you make this?

2017-03-12T20:55:09.339+01:00

This comment has been removed by the author.

Thanks for your time and replies Daniel.

2017-01-15T14:26:02.165+01:00

Thanks for your time and replies Daniel.

It was the 5% that was incorrect, not the part abo...

2017-01-15T12:13:17.580+01:00

It was the 5% that was incorrect, not the part about the sample. Due to time constraints, this is my last reply about this.

Indeed, and that's what Sean also said. So, no...

2017-01-15T11:11:25.982+01:00

Indeed, and that's what Sean also said. So, now things get really confusing:
First you say (in your reply to Sean): "Your statement: "So if alpha is, say, 0.05, then the Type I error rate is always 5% no matter the sample size." is incorrect." But now you say (in a reply to me): "As explained in my MOOC, Type 1 error rate is independent of the sample size". I care about "truth" (maybe as much as you seem to care about "evidence"), but certainly both of your replies cannot be true at the same time.

As explained in my MOOC, Type 1 error rate is inde...

2017-01-14T20:43:59.295+01:00

As explained in my MOOC, Type 1 error rate is independent of the sample size.

Oops neyman...

2017-01-14T20:17:48.190+01:00

Oops neyman...

Will do. (I know my comment is not really about th...

2017-01-14T20:16:25.344+01:00

Will do. (I know my comment is not really about the point you are trying to make in your post, so let's leave it at this. But still very curious about how sample size is related to type I error as defined by Newman and Pearson).

The FPR is a ratio of significant results only, no...

2017-01-14T18:57:23.842+01:00

The FPR is a ratio of significant results only, not of all results. The Type 1 error rate is based on all results. Again, feel free to follow my MOOC week 3 to learn more about this, it would save me some time.

Thanks for your reply. The false positive rate (FP...

2017-01-14T18:13:02.518+01:00

Thanks for your reply.
The false positive rate (FPR) is not the same as the type I error rate: FPR = the probability that you reject AND the null-hypothesis is true (=alpha * Base Rate) or in your terms (alpha * prior probability), but the type I error rate is the probability that you reject GIVEN the null-hypothesis is true: (alpha * Base Rate) / Base Rate = alpha.
Alpha is independent of the base rate and independent of sample size, and you seem to claim that alpha is not independent of sample size.

Hi Gerben, you need to to take prior probabilities...

2017-01-14T17:50:50.715+01:00

Hi Gerben, you need to to take prior probabilities into account. If you do 1000 studies, and the null is true in 500 and there is an effect in 500, the Type 1 error rate is 2.5%. See week 3 in my MOOC for a more extensive explanation.

Your statement that his statement is incorrect is ...

2017-01-14T17:41:07.037+01:00

Your statement that his statement is incorrect is incorrect. The probability of a type I error is a conditional probability (conditional on the null being true). Suppose that under the null your test statistic follows a standard normal distribution, and suppose you decide on alpha = .05. So, you decide to reject the null only if the absolute value of the statistic exceeds 1.96, because if the null is true using this criterion leads to 5% incorrect rejections. (Type 1 errors). Note that irregardless of sample size, 95% of values of your statistic will be between (approximately) -1.96 and 1.96. Or in terms of p-values: with alpha = .05, you will reject when p <= .05. Now, the distribution of the p-value is uniform if the null-hypothesis is true (independent of sample size), so the probability of p < .05 is .05 irregardless of sample size and the probability of a type I error is 5% irregardless of sample size.

such as mock trials, transcription can help the in...

2017-01-03T10:38:53.065+01:00

such as mock trials, transcription can help the individuals who are involved dissect the information at a later time - much as they would once they step into the field of law and begin practicing. See more accurate typing services

2016-12-29T19:31:38.991+01:00

This comment has been removed by a blog administrator.

You can create a truly interactive 3d plot using t...

2016-12-20T18:59:15.304+01:00

You can create a truly interactive 3d plot using the "rgl" package. Here is a quick example which creates a plot with 25 "gumballs".

library(rgl)
dat = data.frame(x = rnorm(25), y = rnorm(25), z = rnorm(25))
plot3d(dat, col = sample(colours(), 25), size = 10)

You can manipulate ("turn around") the resulting plot in real time using your computer mouse, which gives you a stronger sense for the data pattern (and, I believe, improves your memory for that pattern).

Code adapted from the original post (note that I u...

2016-12-19T18:53:22.889+01:00

Code adapted from the original post (note that I used conditional probabilities for the LRs, but that doesn't have any consequence here):

# ** hyperparameters
delta <- 0.5
sd <- 1
n <- 25
pH0 <- 0.5

# plot
prec <- 5 # plotting precision
png("LR_alpha.png", width=1200, height=400, pointsize=18)
par(mfrow=(c(1,3)))

# ** case 1: alpha and power independent (power is held constant at an arbitrary value)
alpha <- seq(10^-prec, .25, 10^-prec)
power <- power.t.test(n=n, sd=sd, delta=delta, sig.level=.05)$power

# calculate probs
ppos <- alpha*pH0 + power*(1-pH0)
pH1.pos <- power*(1-pH0)/ppos
pH0.pos <- alpha*pH0/ppos

likelihood_ratio1 <- pH1.pos/pH0.pos
plot(likelihood_ratio1 ~ alpha, type="l", ylab="LR")
grid()

# ** case 2: power loss taken into account
alpha <- seq(10^-prec, .25, 10^-prec)
power <- sapply(alpha, function(x) power.t.test(n=n, sd=sd, delta=delta, sig.level=x)$power)

# calculate probs
ppos <- alpha*pH0 + power*(1-pH0)
pH1.pos <- power*(1-pH0)/ppos
pH0.pos <- alpha*pH0/ppos

likelihood_ratio2 <- pH1.pos/pH0.pos
plot(likelihood_ratio2 ~ alpha, type="l", ylab="power-adjusted LR")
grid()

# ** case 3: frequency of positive findings taken into account
alpha <- seq(10^-prec, .25, 10^-prec)
power <- sapply(alpha, function(x) power.t.test(n=n, sd=sd, delta=delta, sig.level=x)$power)

# calculate probs
ppos <- alpha*pH0 + power*(1-pH0)
pH1.pos <- power*(1-pH0)/ppos
pH0.pos <- alpha*pH0/ppos

likelihood_ratio_ppos <- pH1.pos/pH0.pos * ppos
plot(likelihood_ratio_ppos ~ alpha, type="l", ylab="(power-adjusted LR) * P(sig)")
grid()

dev.off()

This is an interesting idea, and it gave me a good...

2016-12-19T18:49:05.499+01:00

This is an interesting idea, and it gave me a good “think,” thanks for that! However, I think that the “horrible seesaw methaphor” has some value here because researcher may not always be in the situation where alpha and power are truly independent. From that perspective, it would also be necessary to take the frequency of positive findings into account if the source of evidence is indeed considered a positive finding.

1) Regarding alpha and power

Essentially, the question in this post can be reformulated as “What do I gain from a lower alpha if the power remains the same?” (and vice versa, since the two are treated independently). However, alpha is actually a determinant of the power of a test, and the two can only be treated independently if the sample size, etc. are allowed to vary. This is the situation we have when we plan experiments, and the message of the post is a very good one (as I understand): Given a certain range in sample size that may be feasible, it is best to choose the lowest possible alpha (i.e., maximum feasible sample size) that doesn't reduce power (or not that much).

However, I think the opposite perspective is also important: When we have already conducted an experiment, that is, the sample size and effect size are already fixed. In that case, alpha and power are related. The lower alpha (stricter test) the lower the power.

I made a graph that illustrates this (here: https://pbs.twimg.com/media/C0DYs44W8AEQge2.jpg). To the left there is the intial finding. Evidence increases as alpha increases (as long as power remains constant). In the middle, I assumed that there is a sample (n=25, d=0.5, sd=1) and I took into account that the power becomes lower in such a case when alpha is adjusted. The basic finding is the same, but the curve is a bit less steep.

(2) Frequency

There is another “puzzle piece” I would like to throw into the ring, though. In the post, evidence is defined as the ratio of the probabilities of true vs. false positives, and as such, it relies on the fact that we can actually observe a positive finding. However, if the power drops as alpha decreases (again: with sample fixed from an already conducted experiment), it also becomes less likely that we can observe positive finding.

In other words, this perspective can be summarized as “If we imagine a series of (fixed) experiments, do we gain anything if we conduct stricter tests (at lower alpha)?” In the end, this question has to consider two points: (1) the evidence provided by positive findings and (2) the frequency of positive findings. Both is affected by alpha, and if we correct for (i.e., multiply with) the probability of observing a positive finding, there emerges a different picture.
This shown in the right graph (here: https://pbs.twimg.com/media/C0DYs44W8AEQge2.jpg). Here the (average) evidence accumulated by positive findings becomes lower again with very low alphas. It peaks in this case around 2%. I gave the code below. Feel free to play with it. Interestingly, quite low values of alpha seem to be “optimal” from that perspective when the power is relatively high (e.g., large samples or large effect size).

What this means is that: If we consider alpha and power separate, then alpha takes the cake. But this leaves the sample size needed to conduct such experiments open (and it may be very expensive to do so). If we ask the question differently: “What if I already have a sample? Can a lower alpha help me now?”, then the answer probably is: “It depends.”

Again, nice post. I hope the additional perspective is useful. I think that both perspectives essentially lead to the same conclusion, which is: The larger the sample, the better :)

Sure - but my post is about evidence.

2016-12-19T16:25:30.716+01:00

Sure - but my post is about evidence.

no, because you also need P(H1|D)

2016-12-19T16:25:09.753+01:00

no, because you also need P(H1|D)

Please read my blog post about whether the null is...

2016-12-19T16:24:38.956+01:00

Please read my blog post about whether the null is never true (http://daniellakens.blogspot.nl/2014/06/the-null-is-always-false-except-when-it.html). Your statement: "So if alpha is, say, 0.05, then the Type I error rate is always 5% no matter the sample size." is incorrect.

Very interesting and well done post - thank you. ...

2016-12-19T16:18:31.554+01:00

Very interesting and well done post - thank you. Question - does the relative implication of the risk/benefit of a Type 1 or 2 error matter in deciding which is most important to control? Example - assume we are testing if Drug A cures cancer. Type 1 error rate is very important to control. If we make a Type 1 error, we subject people to side effects of Drug A for no benefit, etc. Now, consider testing if Supplement B improves joint pain. B is very cheap, over the counter, and no side effects. A Type 1 error doesn't "cost" much, little money, no side effects, etc. A Type 2 error, though, would remove a cost effective pain control for suffering people. In this case, would an alpha of 0.1 or even 0.2 be acceptable? Assume Supplement B has no big marketing department and therefore we can only run a small study.