Comments on The 20% Statistician: Using Decision Theory to Justify Your Alpha

Hi Not a statistician, instead a physician trying...

2019-10-25T14:00:17.318+02:00

Hi

Not a statistician, instead a physician trying to learn statistics.
A bit tired today, so perhaps misunderstood something.

I think I have some comments to this. Interesting about decision theory, though.

Here goes:
In finance it’s clear whether your result is “good” or “bad”/”true” or “false”. You have an economic return or loss on a certain level.
In for example science (but also in medicine) you get a result either way.
The “value” lies (somewhat) in whether you can trust the result or not.
The “return” or “loss” could perhaps be seen as whether the applications of the results turn out to be useful in practice or not.

I’m not sure that evaluation through such implementation in general is the best way to go, though.

Instead I think one could start with setting a level of certainty that’s needed when comes to deeming a scientific question answered or not answered.
When claiming a scientific hypothesis is answered - what is the acceptable likelihood that the answer we have is a false nullresult, or a false positive result? (Either for a single hypothesis, or in general, for a number of them.)
Here I think decision theory may have it’s place: What the “sought for” level of certainty should be in a given situation (with given economic restraints, etc), or in the scientific community as a whole, can probably be examined with some form of decision theory - that in combination with known facts, etc.

The levels of certainty perhaps don’t have to be stated in numbers.
Perhaps “very highly likely” or “very, very unlikely” are good enough.

Then, when one knows that level of requested certainty, one can probably use a stepwise process, to reach it.
This similar to a “stepwise diagnostic process” in medicine or psychology that I think you are familiar with, where you often use several test in a row. - In science being equivalent to several studies in a row for a given hypothesis.
There, in general, depending on level of prior probability, etc, I think it may be smart to go for an appropriate level of beta or alpha, to obtain the requested level of certainty for either nulls or positives, in a first run, and then examine either positives or nulls further, depending on which category that is known to contain to many false ones.
- Perhaps similar to Bayesian decision theory that you mention.

This could probably be tested with some sort of simulation.

I may be wrong, but I think that is a somewhat easier approach than the one you propose.

(Perhaps also a bit more informative or effective.
I think it’s better in the long run to know that 3 % of nullresults, and 25 % of positive ones are probably false, than to know that ca 10 % of each are false. In the first you mostly have to test the positive ones further. In the second you more or less have to test both positives and negatives further.)

Best wishes!

Either I've misunderstood this, or there's...

2019-07-16T18:44:39.663+02:00

Either I've misunderstood this, or there's something wrong with it or missing from it. The decision tree in Figure 1 is fine, but the tree in Figure 2 isn't analogous to it. In Fig 1, you make the decision whether or not to invest, and then the chance nodes show all the possible outcomes - the product works, or it doesn't - and the probabilities of those are their unconditional probabilities, 0.5 and 0.5 for each. In Figure 2, you choose the alpha, but the following chance nodes don't include all the possible outcomes. They only include the possibilities that there is a type 1 or a type 2 error, but there's another possibility, that there's no error at all and the test gives the correct outcome. Also, the probabilities assigned to the two error types are conditional - alpha is the probability of a result in the critical region (i.e. 'significant') conditional on the null hypothesis being correct, that is, conditional on the true effect being zero, and beta is the probability of a result outside the critical region (i.e. 'not significant'), conditional on the true effect being non-zero, so you can't just put them both in the same expected value calculation like that, as you then find the expected value from two different probability distributions that are conditional on different things, which makes no sense (to me at least).

In the Figure 1 example there are only two states (product works or not), but in the testing example there are four:
(i) There is no true effect (null hypothesis true) and test result non-significant.
(ii) There is no true effect and test result is significant
(iii) There is a true effect (null hypothesis false) and test result non-significant.
(iv) There is a true effect and test result is significant.

Or you could draw a tree with two sets of chance nodes, one set for whether the null hypothesis is true, and one, which could then be conditional on the first node, for whether the test result is significant or not. Then the probabilities for the second set would be alpha, 1 - alpha for those following "Null hypothesis true", and 1 - beta, beta, for those following "Null hypothesis not true". That would work, but you still have to specify the probabilities on the first set of nodes, that is, the probability of whether the null hypothesis is true, and that is the prior probability that you want to avoid. But I don't think you can avoid it - if you put in all four outcomes on the chance nodes and work out their probabilities, that involves the probability that the null is true, that is, the prior.

You might be able to take a different decision theoretic approach that avoids using the prior probabilities, but the one you've used, with decision trees, is pretty weell inevitably Bayesian, I think.

Thanks Daniel, it's good to hear an informed o...

2019-07-16T13:57:39.988+02:00

Thanks Daniel, it's good to hear an informed opinion which I see as a gentle push away from using the same significance threshold for all kinds of tests in a discipline, or even in sciences as a whole. This has always perplexed me as I'm mostly working in business settings where risks and rewards can be estimated with a fair degree of precision since the number of people/situations affected by a given inference is more or less limited, unlike the sciences.

I've actually worked on arriving at significance thresholds and sample sizes (and therefore power/minimum effect of interest) which achieve optimal balance of risk and reward for an online controlled experiment based on its particular circumstances. A brief description of my work can be found at http://blog.analytics-toolkit.com/2017/risk-vs-reward-ab-tests-ab-testing-risk-management/ while a more detailed expose will soon be released in my upcoming book where I devote a solid 30 pages to the topic ( https://www.abtestingstats.com/ ), for anyone interested.