tag:blogger.com,1999:blog-987850932434001559.post5255982016181050608..comments2017-03-23T07:45:45.975-07:00Comments on The 20% Statistician: TOST equivalence testing R package (TOSTER) and spreadsheetDaniel Lakensnoreply@blogger.comBlogger8125tag:blogger.com,1999:blog-987850932434001559.post-51568226669919092792017-02-24T02:45:43.789-08:002017-02-24T02:45:43.789-08:00Thank you!!!
EnricoThank you!!!<br /><br />EnricoEnrico Glereanhttp://www.blogger.com/profile/16674832915668714617noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-33980430465458071442017-02-24T02:33:28.078-08:002017-02-24T02:33:28.078-08:00Yes, it should be possible - either take the 90% C...Yes, it should be possible - either take the 90% CI approach, or use dedicated software: https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Testing_Equivalence_with_Two_Independent_Samples.pdf - might program it into the package in the future!Daniel Lakenshttp://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-54827849002343792552017-02-24T01:54:17.588-08:002017-02-24T01:54:17.588-08:00Thank you for this! It has been very useful in a p...Thank you for this! It has been very useful in a paper we are finalising. Quick question: is there a TOST equivalent for Likert-type data (e.g. sign rank test instead of t-test)? Would it be enough to convert likert scores to ranks?<br /><br />Here something similar I have found: http://stats.stackexchange.com/questions/52897/equivalence-tests-for-non-normal-data<br /><br />Enrico Glerean, www.glerean.comUnknownhttp://www.blogger.com/profile/16674832915668714617noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-765685965375235572017-01-21T00:00:57.124-08:002017-01-21T00:00:57.124-08:00Hi Regis, yes, I discussed this online, but there ...Hi Regis, yes, I discussed this online, but there is, as far as I know, no formula for Cohen's d for a Welch's test. So we are left with an unsatisfactory situation of having to standardize in a perhaps not ideal way, but I also would not know what is better. Can you email me about this - I'd love to chat more, and given your interest in the R code, you might have good ideas of how to improve this? Daniel Lakenshttp://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-12572994520686308222017-01-20T16:07:32.871-08:002017-01-20T16:07:32.871-08:00Thanks so much for developing this package and tho...Thanks so much for developing this package and thoroughly explaining its use in your paper! I've been using a homegrown function for the TOST, but hadn't taken into account using the Welch test when variances are unequal and I'm appreciative of your package/paper bringing this to my attention. <br /><br />I looked through the code on GitHub and I was wondering why when var.equal=F you are using the root mean square standard deviation formula to set the equivalence boundaries since I believe this assumes equal sample sizes.Regisnoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-82625657272320539562017-01-08T18:58:15.699-08:002017-01-08T18:58:15.699-08:00Hiya Daniel,
So there's something that puzzle...Hiya Daniel,<br /><br />So there's something that puzzled me over the Christmas break and I wondered if you might want to weigh in. I noticed that on Twitter you've said a couple of supportive things about Neyman-Pearson testing, and it sounds like this is your preferred approach to significance testing?<br /><br />The thing is, in my understanding of Neyman-Pearson testing (e.g., http://journals.sagepub.com/doi/abs/10.1177/0959354312465483), you *can* accept (and not just fail to reject) H0, provided that power is high and p > alpha. This stands in contrast to Fisherian NHST, where you can never accept or support H0.<br /><br />In other words... it seems to me that equivalence testing is the solution to a problem that Neyman-Pearson testing doesn't actually have (well, if you have enough power anyway). There are complications of convention in the sense that the typical target Type 2 error rate is 20% whereas we might want it to be less than that if we want to provide convincing evidence for H0, but in principle... NP alone has the tools to support a null.<br /><br />So anyhow: Is your understanding of NP testing different than mine? If not, are there pragmatic reasons why you think equivalence testing is necessary even for someone working in an NP framework?Matthttp://www.blogger.com/profile/15143483413289978878noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-71425324540804103982016-12-10T01:33:51.638-08:002016-12-10T01:33:51.638-08:00Hi, the TOST procedure is the same as using 90%CI ...Hi, the TOST procedure is the same as using 90%CI - but for some conceptual differences, see the pre-print. <br /><br />You don't want to say values outside the 90% CI are not equivalent - because there you are testing against 0. See the pre-print for the explanation how studies can be significant AND equivalent. <br /><br />Yes, the output could be in Cohen's d, but I carefully chose not to do this. You might not have noticed, but setting equivalence bounds in Cohen's d is actually the new thing in my implementation of equivalence tests. Just a tiny change, but it will make it more intuitive for psychologists. However, some people worry about using standardized effect sizes, and raw differences are easier to interpret, and arguably, it should be the goal to express all effects in raw scales. So, you can set the equivalence bounds in d, but I give feedback in raw scale units. But, if more people prefer CI around d, I might provide these as well in a future update.Daniel Lakenshttp://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-53676495143146038532016-12-09T19:43:44.548-08:002016-12-09T19:43:44.548-08:00This looks really cool... I might be biased becaus...This looks really cool... I might be biased because I enjoyed seeing some of my own work as the example (thanks). <br /><br />A couple of questions from a quick read:<br />* How does the TOST 90% CI relate to the NHST 95% CI? Meaning, if I recalculate the NHST CI at the 90% level, would it come out similar to the TOST CI? Is there any major difference in how they are interpreted?<br />* In the paper, Eileen and I reported the 95% CI for the standardized effect size (Cohen's d): d = -0.03, [-0.32, 0.26]. Again, for the purposes of the TOST test, we'd need a 90% CI. But otherwise is doing the TOST analysis similar to examining if the boundary is within this CI? Would it be right to say that any value outside this CI is non-equivalent? I feel like that's not quite the same, but I'm not understanding how.<br /><br />One top-of-the-head suggestion might be to have the package give the CIs in terms of standardized effects, since that's how the boundary conditions for non-equivalence are specified...feels easier to then compare them back to the boundary. I'm not sure if that's trivial to implement or not. <br /><br />Oh - and just wanted to point out a couple of things about the example study..the example study above is from what the paper labelled as Study 2, collected with MTurk participants. It was one of 3 replications conducted, the other two with live participants, and so the overall effect size obtained provided some pretty narrow boundaries on a plausible effect: d = 0.06, 95% CI[-0.14, 0.26]. These details are not at all essential to explaining this (cool) new R package... just wanted to point out that how we had approached estimating the boundaries for the proposed effect. <br /><br />Nice work!spaghetticodehttp://www.blogger.com/profile/16142046367319075700noreply@blogger.com