tag:blogger.com,1999:blog-987850932434001559.post5226303723333788297..comments2017-09-25T06:14:51.048-07:00Comments on The 20% Statistician: One-sided tests: Efficient and UnderusedDaniel Lakensnoreply@blogger.comBlogger20125tag:blogger.com,1999:blog-987850932434001559.post-73633683662229772792017-01-28T09:16:16.460-08:002017-01-28T09:16:16.460-08:00Thanks for your comment. This is not related to Ty...Thanks for your comment. This is not related to Type 1 errors. See my blog where I discuss the topic you raise in more detail: http://daniellakens.blogspot.nl/2015/09/how-can-p-005-lead-to-wrong-conclusions.htmlDaniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-70997575936094782622017-01-28T09:04:02.648-08:002017-01-28T09:04:02.648-08:00http://rsos.royalsocietypublishing.org/content/1/3...http://rsos.royalsocietypublishing.org/content/1/3/140216<br /><br />Your false positive rate won't be 5 percent with alpha of 0.05 if your experiments are underpowered. Could be as much as 30 percent.Lawrence Moonhttps://www.blogger.com/profile/09156242072181189135noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-44326293601067659192017-01-28T04:15:04.577-08:002017-01-28T04:15:04.577-08:00Hi anonymous, I think you are completely wrong, be...Hi anonymous, I think you are completely wrong, because exploration (finding surprising data) is not hypothesis testing (testing a prediction). Daniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-89482007706326049342017-01-28T03:48:18.092-08:002017-01-28T03:48:18.092-08:00I agree with Lee and propose that one should not u...I agree with Lee and propose that one should not use one tailed tests in drug testing or basic biological research even when one guesses at a direction of change a priori our even when preregistered. This is because a change in the opposite direction may be a vital new counter example for challenging a hypothesis (a la Popper) that may shed light on a new mechanism or reveal an unexpected toxicity. Sample size calculations reveal that two tailed tests are often not much more demanding in resources for typical studies in Neuroscience. Try it out in G Power!Unknownhttps://www.blogger.com/profile/09156242072181189135noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-77672683626482830472016-09-05T02:41:24.421-07:002016-09-05T02:41:24.421-07:00Our region of certainty extends in all directions ...Our region of certainty extends in all directions around the estimate. Standard-error-based intervals reflect 1.96*SE around the estimate. Profile-based estimates can be asymmetric (and are better - which is why programs like OpenMx support them). A CI may run up against a bound in the model. Some such bounds are statistical anomalies (negative variance, for instance), but others are artificial (like bounding a mean difference to be positive) should be interpreted as one-side only in the sense that the author doesn't wish to explore that side of space.<br /><br />PS: This stack item is worth reading on profile likelihoods: <br />https://stats.stackexchange.com/questions/9833/constructing-confidence-intervals-based-on-profile-likelihoodTimothy Bateshttps://www.blogger.com/profile/12707381996365946983noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-40622297464299355842016-09-05T02:24:32.910-07:002016-09-05T02:24:32.910-07:00The current goal has to be to remove bogus claims ...The current goal has to be to remove bogus claims from the literature. False social science is so bad that replication is as likely to find purported treatments have significant deleterious effects as they are to show null effects. The only people, IMHO, who need to pre-register are adherents of the original study. Giving them a 1-sided option doubles the confirmatory error rate, makes their "we had >80% power" inconsistent with what most people understand power to mean (i.e., power at a=.05), and excuses them from concluding that their potion is actually harmful. I'd rate it as a positively harmful change.Timothy Bateshttps://www.blogger.com/profile/12707381996365946983noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-55507326097709521482016-08-07T13:13:20.733-07:002016-08-07T13:13:20.733-07:00Yes, because it tests your hypothesis, even if the...Yes, because it tests your hypothesis, even if the data of the second study is no in line with your prediction.Daniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-6525709990930469012016-08-07T12:38:45.938-07:002016-08-07T12:38:45.938-07:00Dear Daniel, do you think, that use of one-sided t...Dear Daniel, do you think, that use of one-sided test is still appropriate if from the means it is evident that in the first run of the experiment the changes follow the expected direction, and in the second run there is no change at all? Daniel Kislyukhttps://www.blogger.com/profile/08710412079800909393noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-71359509373876911062016-05-07T10:19:08.776-07:002016-05-07T10:19:08.776-07:00I think Nick's sense that something is not rig...I think Nick's sense that something is not right here might reflect the difference between the statistical error rate and what we might call the effective error rate, by which I mean the error rate that would lead to a change in practice, for example. <br /><br />Let's take your lecture quiz example, and imagine that the quizzes have no effect. Under a one-tailed test (in the direction of improving scores), 5% of the time we'd erroneous conclude that they were helpful and keep giving them, thus wasting everyone's time. If we'd done a two-tailed test, we'd also have been wrong 5% of the time, but half of those times would be in the opposite direction, making us think quizzes actually hurt performance. In only 2.5% of the cases would we think that quizzes helped and keep giving them. As you point out, the question of quizzes being counterproductive is not really relevant to the question of whether to give them - to make that decision, it's enough that they're not helpful. <br /><br />So even though the rate of wrong conclusions is 5% in both types of tests, the rate of wrong responses is not. With a two-tailed test, we'd have kept giving quizzes only 2.5% of the time; with a one-tailed, we'd keep giving them 5% of the time.<br /><br />The same sort of logic applies to something like a drug trial. With the more common two-sided test, we'd only make an error on the side of benefit 2.5% of the time, whereas we'd do that 5% of the time with a one-sided test. That might still be an acceptable rate, but I think we'd want to acknowledge that a major shift to one-sided tests would likely lead to an increase in the rate at which useless treatments were pursued.Alistair Cullumhttps://www.blogger.com/profile/16193690419324782781noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-4904074370052341482016-04-16T05:32:50.100-07:002016-04-16T05:32:50.100-07:00Hi Prof Lakens, interesting article. One-sided con...Hi Prof Lakens, interesting article. One-sided confidence intervals never seem to be discussed and I wonder if this would be a useful discussion point, along with their interpretation.<br /><br />DanAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-70457903999666437392016-03-29T20:48:02.408-07:002016-03-29T20:48:02.408-07:00Hi Lee, thanks for dropping by! I wrote a follow-u...Hi Lee, thanks for dropping by! I wrote a follow-up post on asymmetric tests: http://daniellakens.blogspot.de/2016/03/who-do-you-love-most-your-left-tail-or.html which achieves what you want (test effects in two directions) while giving you power benefits for the effect you care about.<br /><br />Your example makes sense, if you are examining a theory that makes both these predictions. There are still many situations where theories make directional predictions, and where we can be more efficient. Daniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-28124179502517158082016-03-29T14:13:16.070-07:002016-03-29T14:13:16.070-07:00The major problem with one tailed tests (as far as...The major problem with one tailed tests (as far as I can tell) is that the researcher then CANNOT interpret a result in the "wrong" or opposite direction as statistically significant. <br /><br />When one has a single, strong, clear, pre-registered, uni-directional hypothesis, this is a non-issue.<br /><br />However, my view is that some of the best social psychology (my field) tests plausible alternative hypotheses. Here is a simple example.<br /><br />1. Racial stereotypes bias judgments of African American job applicants, who, all things being equal, will be judged more negatively than White applicants.<br /><br />2. Racial stereotypes set up expectations, which, when violated, lead to more extreme evaluations. A White job applicant with a weak background has not situational excuse; because people are aware of discrimination, an African American applicant with an identically weak background is probably more competent. Similarly, among equally strong applicants, the African American will be seen as even more impressive than the White applicant, by virtue of getting there by overcoming discrimination. In both cases, all things being equal, on average, people will more positively evaluate the African American applicant.<br /><br />Testing plausible alternative hypotheses may not be "the" answer to social psychology's troubles, but it should be in the toolbox, bigtime.<br /><br />Lee JussimI'dratherbeplayingtennishttps://www.blogger.com/profile/15172903475376032782noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-70263607887330478672016-03-18T15:13:19.941-07:002016-03-18T15:13:19.941-07:00The problem is that a one-tailed t test is a direc...The problem is that a one-tailed t test is a directional test, but a one-tailed F test is non-directional (being equivalent to the sum of both tail probabilities in the t test).<br /><br />> pf(2.82,1,1000) # about 10% in right tail<br />[1] 0.9065912<br />> pt(1.68,1000) # about 5% in right tail<br />[1] 0.9533652<br /><br />I dislike the one-tailed or two-tailed terminology. Conceptually we should care about whether the test is directional or not. In principle one can carve up a non-directional tests into 2 or more directional hypotheses (depending on how many df the effect has).Thom Baguleyhttps://www.blogger.com/profile/00392478801981388165noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-27941242387482604692016-03-18T05:29:49.431-07:002016-03-18T05:29:49.431-07:00Pre-registration seems a pre-requisite for avoidin...Pre-registration seems a pre-requisite for avoiding the need to do a two-tailed test -- the main point of which is to keep researchers honest where some might be tempted to report a one-sided p-value in the 'other' direction. Pre-registration should probably pre-specify both the directional hypothesis and, to be safe, the intention to use a one-tailed test.<br /><br />FWIW, the ANOVA F-test is one tailed because extreme differences in means are represented by larger F-statistics, whatever the direction of the difference. In effect, two tails have been combined into one. If your hypothesis is directional, and requires a one-tailed test, it would be an error to use ANOVA to test it.<br /><br />That said, it might often seem reasonable to calculate a "two-tailed" confidence interval, even alongside a one-tailed p-value. That would confuse those readers (and possibly, co-authors) who cannot reconcile a 95% CI crossing 0 and a P<0.05. I suspect that faith in the CI/p-value correspondence is more dearly held (particularly by non-statisticians) than the belief that all tests should be two-tailed.Ben Cairnshttps://www.blogger.com/profile/04001365890437371613noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-86934837307309898062016-03-18T04:52:55.637-07:002016-03-18T04:52:55.637-07:00Excellent post. It seems many people seem to miss ...Excellent post. It seems many people seem to miss the forest for the trees! The goal is appropriate hypothesis testing, not getting excited (or frustrated) about a p value falling in a particular range.Andrew Khttps://www.blogger.com/profile/00768214042986413291noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-85642377776101529202016-03-18T04:36:11.552-07:002016-03-18T04:36:11.552-07:00I will think more about the false-positive issue. ...I will think more about the false-positive issue. I'm not convinced it is so simple.<br /><br />On the other question, maybe an ANOVA should be a one-tailed test, but that doesn't seem to be what the software is doing. Have a look at nick.brown.free.fr/stuff/TvsF/TvsF.R (or for SPSS users, nick.brown.free.fr/stuff/TvsF/TvsF.sav and nick.brown.free.fr/stuff/TvsF/TvsF.spv). I compare two groups with a t test and an ANOVA. t is -1.68, which would be significant with a one-tailed test; the two-tailed p is .098, so the one-tailed p would be .049 (give or take the Welch question ;-)). F is 2.82 (i.e., -1.68 squared), and the p value is the same, give or take rounding at some point. So if the ANOVA is doing a one-tailed test, why does it give the same p value as a two-tailed t test on the same data? What am I missing here?Nick Brownhttps://www.blogger.com/profile/18266307287741345798noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-89125722305606256002016-03-18T01:36:51.394-07:002016-03-18T01:36:51.394-07:00They undoubtly are, just like p-values between 0.0...They undoubtly are, just like p-values between 0.025 and 0.05 are more often associated with non-replicable results than they should. <br /><br />Hence, the pre-registration. We solve the problem of inflated error rates, AND YOU GET TO BE 20% MORE EFFICIENT. <br /><br />Win-Win.Daniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-6634399086065106462016-03-18T01:34:18.203-07:002016-03-18T01:34:18.203-07:00The use of one-sided t-test should be restricted t...The use of one-sided t-test should be restricted to pre-registered studies. No pre-registration, no use of one-sided t-test. <br /><br />I hate when people use one-sided t-test in their paper and you realize that p=0.0264. Therefore, a two-sided t-test would not yield a significant effect. That would be interesting to study. Are one-sided t-test more often associated with p-values between 0.025 and 0.05 than they should?<br /><br />JJAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-79231284464234410762016-03-18T00:04:48.297-07:002016-03-18T00:04:48.297-07:00Hi Nick, you have both things wrong. An alpha of 0...Hi Nick, you have both things wrong. An alpha of 0.05 means you have a 5% Type 1 error rate, max. Your false positives rate is 5% max. You will say 'There is something here' when there is nothing, 5% of the time, max. So one-sided testing does not increase the false positives. <br /><br />A two-tailed ANOVA does not exist. An ANOVA is always a one-tailed test. There is a difference in the means, or there is no difference in the means. See: http://stats.stackexchange.com/questions/67543/why-do-we-use-a-one-tailed-test-f-test-in-analysis-of-variance-anovaDaniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-67717675351430720942016-03-17T16:33:12.335-07:002016-03-17T16:33:12.335-07:00Yes, you get more power. But if you keep an alpha ...Yes, you get more power. But if you keep an alpha level of .05, you also increase your false positive rate, because the area of the part of the tail that causes you to say you got a "significant result" is twice as large.<br /><br />Also, many statistical methods (are there any apart from t-tests and z-tests) don't admit one-tailed tests. So you can be in the situation of using ANOVA (2-tailed) and t tests (1-tailed) very close to each other in the same analyses. This seems like a recipe for confusion (at least).Nick Brownhttps://www.blogger.com/profile/18266307287741345798noreply@blogger.com