tag:blogger.com,1999:blog-987850932434001559.comments2020-08-04T07:57:23.556+02:00The 20% StatisticianDaniel Lakenshttp://www.blogger.com/profile/18143834258497875354noreply@blogger.comBlogger952125tag:blogger.com,1999:blog-987850932434001559.post-30256487129935513132020-05-11T17:28:35.279+02:002020-05-11T17:28:35.279+02:00Hello Daniel,
This is a very nice blog post (as al...Hello Daniel,<br />This is a very nice blog post (as always). I have read the article, I am using Jamovi and Toster module, and I have played with the provided Excel sheet. Though I do not understand one thing. I would like to perform an equivalence test on two independent samples, where I do not know the ES to test for, but I want to use raw scores instead. Both the Jamovi module and the Excel sheet allow doing so. Let‘s say I want to know if the two samples differ (or are equal) on a 7-point scale where +/- 0.5 is the threshold I am interested in. I can specify the raw score as -0.5 and 0.5, but where can I specify the lenght of my scale? I suppose 0.5 point on a 5-point scale is different to 0.5 on 7- or 9- or 11-point scales. Or am I missing something?<br /><br />I hope you can help me...<br /><br />Víthttps://www.blogger.com/profile/11229704654709925638noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-38288543728061043442020-05-11T14:57:55.972+02:002020-05-11T14:57:55.972+02:00here is the enlightening Response from the author:...here is the enlightening Response from the author:<br />https://www.talyarkoni.org/blog/2020/05/06/induction-is-not-optional-if-youre-using-inferential-statistics-reply-to-lakens/schw.stefanhttps://www.blogger.com/profile/15604298958596656339noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-4249556460061666972020-04-01T00:33:25.827+02:002020-04-01T00:33:25.827+02:00OK, now I'm getting nervous. The F statistic ...OK, now I'm getting nervous. The F statistic in one-way ANOVA uses the MSE as its denominator, and that's just pooled variance on steroids. Should we re-think the F test with more carefully designed factors? TheRandomTexanhttps://www.blogger.com/profile/07562250696288742894noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-25422591131044765102020-02-06T13:29:40.561+01:002020-02-06T13:29:40.561+01:00Our reviewer's comment is this: The authors sh...Our reviewer's comment is this: The authors should provide a back-to-the envelope assessment of what is the power of the tests given the sample size (a classic reference to look at would be Andrews, Donald W. K. 1989. “Power in Econometric Applications.” Econometrica 57(5):1059–1090). Are you familiar with this approach? It is talking about an inverse power function. Thank youMargaux Labonnehttps://www.blogger.com/profile/15290373811323434660noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-30132960131850131992020-01-24T20:18:21.252+01:002020-01-24T20:18:21.252+01:00I haven't read Yarkoni's paper (yet), so t...I haven't read Yarkoni's paper (yet), so this is not meant as a defense of his position. Rather, it's just a response to this post.<br /><br />What exactly do you mean and/or what do you take Yarkoni to mean by "alignment between theories and tests"? And what counts as "close" alignment? I ask because I don't find your argument compelling that close alignment doesn't matter in a (hypothetico-)deductive approach to theory evaluation. I assume you agree that a theory that predicts that "cleanliness reduces the severity of moral judgments" would not be well-tested (indeed would not be tested at all) by, say, measuring and comparing walking speed after priming or not priming people with age-related words. If so, then there is some degree of alignment between tests and theories required even for a deductive approach.<br /><br />It's also not clear to me exactly what role statistical testing plays for you in all this. I'm at least approximately on the same page as you with respect to induction (if it's <a href="https://twitter.com/annemscheel/status/1198718886436311041" rel="nofollow">plebeian induction</a> we're talking about, anyway), but statistical tests are explicitly concerned with drawing inferences about populations, not just observing what occurs in samples. That is, statistical tests are explicitly about <i>generalization</i>, at least in part.Noah Motionhttps://www.blogger.com/profile/00150446498549219747noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-18270611614105768322019-10-25T14:00:17.318+02:002019-10-25T14:00:17.318+02:00Hi
Not a statistician, instead a physician trying...Hi<br /><br />Not a statistician, instead a physician trying to learn statistics.<br />A bit tired today, so perhaps misunderstood something.<br /><br />I think I have some comments to this. Interesting about decision theory, though.<br /><br />Here goes:<br />In finance it’s clear whether your result is “good” or “bad”/”true” or “false”. You have an economic return or loss on a certain level.<br />In for example science (but also in medicine) you get a result either way.<br />The “value” lies (somewhat) in whether you can trust the result or not.<br />The “return” or “loss” could perhaps be seen as whether the applications of the results turn out to be useful in practice or not.<br /><br />I’m not sure that evaluation through such implementation in general is the best way to go, though.<br /><br />Instead I think one could start with setting a level of certainty that’s needed when comes to deeming a scientific question answered or not answered.<br />When claiming a scientific hypothesis is answered - what is the acceptable likelihood that the answer we have is a false nullresult, or a false positive result? (Either for a single hypothesis, or in general, for a number of them.)<br />Here I think decision theory may have it’s place: What the “sought for” level of certainty should be in a given situation (with given economic restraints, etc), or in the scientific community as a whole, can probably be examined with some form of decision theory - that in combination with known facts, etc.<br /><br />The levels of certainty perhaps don’t have to be stated in numbers.<br />Perhaps “very highly likely” or “very, very unlikely” are good enough.<br /><br />Then, when one knows that level of requested certainty, one can probably use a stepwise process, to reach it.<br />This similar to a “stepwise diagnostic process” in medicine or psychology that I think you are familiar with, where you often use several test in a row. - In science being equivalent to several studies in a row for a given hypothesis.<br />There, in general, depending on level of prior probability, etc, I think it may be smart to go for an appropriate level of beta or alpha, to obtain the requested level of certainty for either nulls or positives, in a first run, and then examine either positives or nulls further, depending on which category that is known to contain to many false ones.<br />- Perhaps similar to Bayesian decision theory that you mention.<br /><br />This could probably be tested with some sort of simulation.<br /><br />I may be wrong, but I think that is a somewhat easier approach than the one you propose.<br /><br />(Perhaps also a bit more informative or effective.<br />I think it’s better in the long run to know that 3 % of nullresults, and 25 % of positive ones are probably false, than to know that ca 10 % of each are false. In the first you mostly have to test the positive ones further. In the second you more or less have to test both positives and negatives further.)<br /><br />Best wishes!Gustav Holsthttps://www.blogger.com/profile/17781110218299868363noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-52304147134124726882019-10-15T19:37:26.788+02:002019-10-15T19:37:26.788+02:00Great, thank you very much!
I think we should also...Great, thank you very much!<br />I think we should also improve education about ICs.<br />¿Do you know this paper?<br />http://learnbayes.org/papers/confidenceIntervalsFallacy/fundamentalError_PBR.pdfRosana Ferrerohttps://www.blogger.com/profile/05891745549463934951noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-62332668192820537152019-10-11T19:41:38.519+02:002019-10-11T19:41:38.519+02:00Thank you for the wonderful package.
Is there a g...Thank you for the wonderful package.<br /><br />Is there a good way to determine equivalence margins?Anonymoushttps://www.blogger.com/profile/11008284843134095666noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-56681229644173713102019-10-11T13:09:40.125+02:002019-10-11T13:09:40.125+02:00I know that editorials are mass-produced, includin...I know that editorials are mass-produced, including similar criticisms.<br /><br />However, I don't know anyone who showed "how can I calculate post-hoc power" in the case of Welch's t-test, especially when the sample sizes of the two groups are different.<br /><br /><br />I'm asking to show it in a calculation code or mathematical formula in following community but,...<br /><br />https://stats.stackexchange.com/questions/430030/what-is-the-post-hoc-power-in-my-experiment-how-to-calculate-thisAnonymoushttps://www.blogger.com/profile/11008284843134095666noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-49611279115177915332019-10-04T10:27:21.576+02:002019-10-04T10:27:21.576+02:00Gelman nails it down: "t’s fine to estimate p...Gelman nails it down: "t’s fine to estimate power (or, more generally, statistical properties of estimates) after the data have come in—but only only only only only if you do this based on a scientifically grounded assumed effect size. One should not not not not not estimate the power (or other statistical properties) of a study based on the “effect size observed in that study.” <br /><br /><br />https://statmodeling.stat.columbia.edu/2018/09/24/dont-calculate-post-hoc-power-using-observed-estimate-effect-size/Sehttps://www.blogger.com/profile/07697710753267461129noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-29233456722390713382019-09-15T18:19:35.510+02:002019-09-15T18:19:35.510+02:00This resource is available for repeated measures (...This resource is available for repeated measures (and mixed) designs, though I think only for 2 way designs<br />https://www.aggieerin.com/shiny-server/tests/omegaprmss.html<br /><br />Also provides CIsawfoothttps://www.blogger.com/profile/04375839310761465743noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-69851497628740454072019-09-15T18:19:09.819+02:002019-09-15T18:19:09.819+02:00Here’s the thing: the problem really isn’t how to ...Here’s the thing: the problem really isn’t how to explain p-values better. The problem is that people generally don’t know a) what the aim of science is and b) why we would want to use p-values in furtherance of that aim.<br /><br />Long story short: There can be no such thing as certain (or even probable) knowledge. Knowledge can be objective, but it will always remain relative to fundamental assumptions. That implies that we can only achieve <i>successively better</i> knowledge. For that, we can employ valid, deductive logic, which enables us to make choices (www.theopensociety.net/2011/08/the-power-of-logic) that can in turn be informed by (a distribution of!) p-values.Peterhttps://www.blogger.com/profile/00208530977584493728noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-68072481689673487912019-09-15T16:55:17.492+02:002019-09-15T16:55:17.492+02:00Great [post - thank you!Great [post - thank you!Dave Gileshttps://www.blogger.com/profile/05389606956062019445noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-55058686878789761172019-08-20T09:14:14.095+02:002019-08-20T09:14:14.095+02:00Hi Daniel, many thanks for this resource. Would th...Hi Daniel, many thanks for this resource. Would there be any chance of converting them to a Shiny app, to really drop the barriers to use? A really nice example of similar resources that have been implemented in a Web interface is https://www.estimationstats.com/#/ Anonymoushttps://www.blogger.com/profile/06522283751759510506noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-65795679626421516372019-08-20T05:07:17.052+02:002019-08-20T05:07:17.052+02:00i'd add that planned missingness is a great wa...i'd add that planned missingness is a great way to have sufficient power given limited resources! planned missing data designs (PMDD) + FIML estimation can lead to very similar results & conclusions - assuming missingness is planned to be (completely) at randomAnonymoushttps://www.blogger.com/profile/04711853547374881776noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-28975326316811825912019-08-14T14:54:08.374+02:002019-08-14T14:54:08.374+02:00Power = people?
Thomas Schmidt, University of Kais...Power = people?<br />Thomas Schmidt, University of Kaiserslautern, Germany<br /><br />There is yet another way to improve your power: Use more trials from the participants you have. Actually power depends on two things: the number of participants in the sample and the reliability of the measurements. However, reliability directly depends on the number of trials. There are several simulation studies that show that both levels (people and trials) are about equally important in determining statistical power. There are many areas of psychology that successfully work with small groups of subjects but massive repetition of measurement -- psychophysics is a good example. In my research, I almost invariably use eight participants and control power entirely by the number of sessions. In my experience, well-trained subjects perform so much more reliably than untrained ones that they can give you high data quality even with limited resources. There is also a convenient, citable name for this approach: Smith & Little (2018) call it "small-N design".<br /><br />Apart from statistical power, there is yet another time-honoured concept that is used in engineering: measurement precision. Precision can simply be defined by setting an upper limit to the standard error of the dependent variable -- all you need is a rough idea about the standard deviation. In a recent paper, we included the following passage to justify our sample sizes (Biafora & Schmidt, 2019): <br /><br />"In multi-factor repeated-measures designs, statistical power is difficult to predict because too many terms are unknown. Instead, we control measurement precision at the level of individual participants in single conditions. We calculate precision as s/√r (Eisenhart, 1969), where s is a single participant's standard deviation in a given cell of the design and r is the number of repeated measures per cell and subject. With r = 120 and 240 in the priming and prime identification task, respectively, we expect a precision of about 5.5 ms in response times (assuming individual SDs around 60 ms), at most 4.6 percentage points in error rates, and at most 3.2 percentage points in prime identification accuracy (assuming the theoretical maximum SD of .5)."<br />Thomas Schmidthttps://www.blogger.com/profile/06538006415414781450noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-41348114351340535502019-08-06T10:29:56.465+02:002019-08-06T10:29:56.465+02:00Hi Daniel,
This might be a lame question, but you...Hi Daniel,<br /><br />This might be a lame question, but your answer would be of immense help. <br /><br />Can i conduct a equivalence test for a one-proportions test?<br /><br />for example, i have binomial outcome variable from an experiment in which participants answered yes or no (example: yes=60, no = 40; N =100). Where p is proportions of people who answered yes. My hypothesis is:<br /><br />H0: p=0.5<br />H1: p>0.5.<br /><br />best, prasad<br />prasadhttps://www.blogger.com/profile/07017280705402230912noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-1108881644459718352019-07-20T01:37:33.167+02:002019-07-20T01:37:33.167+02:00Hey Daniel, great post - thanks for sharing! I hav...Hey Daniel, great post - thanks for sharing! I have a couple suggestions for improvement and a question:<br />1) Thought you might like to know your first line of R-script for your function is missing double quotes.<br />res = optimal_alpha(power_function = [ADD_DOUBLE_QUOTES_HERE]pwr.t.test(d=0.5, n=100, sig.level = x, type='two.sample', alternative='two.sided')$power")<br /><br />2) For some reason, the balance function produces incorrect total error rates. For example, the following produces a res$tot = 8.888209e-08 but a res$alpha + res$beta = 0.9967886.<br />res = optimal_alpha(power_function = "pwr.t.test(d=0.001, n=30000, sig.level = x, type='two.sample', alternative='two.sided')$power", error = "balance")<br />res$alpha<br />res$beta<br />res$tot<br />res$beta + res$alpha<br /><br />3) You mention "If you collect large amounts of data, you should really consider lowering your alpha level." I'm not sure if I follow entirely. Assuming a sample size of 10000 where Cohen's d = 0.2, then adjusting the alpha from 0.5 to something smaller such as .0000000000000000005 has no impact on power, right? I'm probably missing something here, so I'd love to hear your thoughts.Anonymoushttps://www.blogger.com/profile/14576444348427501628noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-64874535164645276652019-07-16T18:44:39.663+02:002019-07-16T18:44:39.663+02:00Either I've misunderstood this, or there's...Either I've misunderstood this, or there's something wrong with it or missing from it. The decision tree in Figure 1 is fine, but the tree in Figure 2 isn't analogous to it. In Fig 1, you make the decision whether or not to invest, and then the chance nodes show all the possible outcomes - the product works, or it doesn't - and the probabilities of those are their unconditional probabilities, 0.5 and 0.5 for each. In Figure 2, you choose the alpha, but the following chance nodes don't include all the possible outcomes. They only include the possibilities that there is a type 1 or a type 2 error, but there's another possibility, that there's no error at all and the test gives the correct outcome. Also, the probabilities assigned to the two error types are conditional - alpha is the probability of a result in the critical region (i.e. 'significant') conditional on the null hypothesis being correct, that is, conditional on the true effect being zero, and beta is the probability of a result outside the critical region (i.e. 'not significant'), conditional on the true effect being non-zero, so you can't just put them both in the same expected value calculation like that, as you then find the expected value from two different probability distributions that are conditional on different things, which makes no sense (to me at least). <br /><br />In the Figure 1 example there are only two states (product works or not), but in the testing example there are four:<br />(i) There is no true effect (null hypothesis true) and test result non-significant.<br />(ii) There is no true effect and test result is significant<br />(iii) There is a true effect (null hypothesis false) and test result non-significant.<br />(iv) There is a true effect and test result is significant.<br /><br />Or you could draw a tree with two sets of chance nodes, one set for whether the null hypothesis is true, and one, which could then be conditional on the first node, for whether the test result is significant or not. Then the probabilities for the second set would be alpha, 1 - alpha for those following "Null hypothesis true", and 1 - beta, beta, for those following "Null hypothesis not true". That would work, but you still have to specify the probabilities on the first set of nodes, that is, the probability of whether the null hypothesis is true, and that is the prior probability that you want to avoid. But I don't think you can avoid it - if you put in all four outcomes on the chance nodes and work out their probabilities, that involves the probability that the null is true, that is, the prior.<br /><br />You might be able to take a different decision theoretic approach that avoids using the prior probabilities, but the one you've used, with decision trees, is pretty weell inevitably Bayesian, I think.Kevin McConwayhttps://www.blogger.com/profile/13163867937943443456noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-62515136536636619722019-07-16T13:57:39.988+02:002019-07-16T13:57:39.988+02:00Thanks Daniel, it's good to hear an informed o...Thanks Daniel, it's good to hear an informed opinion which I see as a gentle push away from using the same significance threshold for all kinds of tests in a discipline, or even in sciences as a whole. This has always perplexed me as I'm mostly working in business settings where risks and rewards can be estimated with a fair degree of precision since the number of people/situations affected by a given inference is more or less limited, unlike the sciences.<br /><br />I've actually worked on arriving at significance thresholds and sample sizes (and therefore power/minimum effect of interest) which achieve optimal balance of risk and reward for an online controlled experiment based on its particular circumstances. A brief description of my work can be found at http://blog.analytics-toolkit.com/2017/risk-vs-reward-ab-tests-ab-testing-risk-management/ while a more detailed expose will soon be released in my upcoming book where I devote a solid 30 pages to the topic ( https://www.abtestingstats.com/ ), for anyone interested.Anonymoushttps://www.blogger.com/profile/15010168795141940884noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-80025694200887734322019-05-14T20:28:46.107+02:002019-05-14T20:28:46.107+02:00Hi Daniel, how this standarization of P-values bas...Hi Daniel, how this standarization of P-values based on sample size can be coupled to the multiple-testing adjustment by Bonferroni or BH? Dodgerhttps://www.blogger.com/profile/00995569797695061139noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-68019325365475811122019-05-10T03:15:50.064+02:002019-05-10T03:15:50.064+02:00I am taking your course in Coursera. This article ...I am taking your course in Coursera. This article summarizes many ideas that you put on the course. Thank you.Walter Lorenzohttps://www.blogger.com/profile/07629680040548146565noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-45302767731762833652019-05-04T05:28:02.383+02:002019-05-04T05:28:02.383+02:00p value should be there, just to validate the meth...p value should be there, just to validate the methodological correctness and assigning uniformity in research work or strengthening justifications to the findings only with respect to the individualistic terms of the work, but not to support the hypothesis as universal fact. Of course, we can encourage reporting Power and effect size, because there are many studies where Power is compromised. What I liked Trafimow’s article is that it vibrates the dishonest attempt of researchers to get their paper published in journals based on p value with unrealistic elements like exceptionally low n (as small as 3), skewed distributions, non-homogeneity etc. BASP might have fatigued with such type of papers. That is why they wrote "we encourage the use of larger sample sizes than is typical in much psy-chology research, because as the sample size increases,descriptive statistics become increasingly stable and sampling error is less of a problem" (from Trafimow & Marks, 2015, doi.10.1080/01973533.2015.1012991). Honest and judicious use of p or CI is always welcome.www.surjyasaikia.inhttps://www.blogger.com/profile/09172588142202332799noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-59951569656772874422019-04-10T00:22:45.369+02:002019-04-10T00:22:45.369+02:00It's not super-clear that Cohen wasn't. Me...It's not super-clear that Cohen wasn't. Meehl, after all, didn't talk much about experimental randomized interventions, and he was called on it by Oakes (https://www.gwern.net/docs/statistics/1975-oakes.pdf) who gave as a counter-example the now-forgotten OEO 'performance contracting' school reform experiment (https://www.gwern.net/docs/sociology/1972-page.pdf) where despite randomization of dozens of schools with ~33k students, not a single null could be rejected.gwernhttps://www.blogger.com/profile/18349479103216755952noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-53469083120658883772019-03-14T18:21:37.760+01:002019-03-14T18:21:37.760+01:00Hi, thank you very much for this page, this is ver...Hi, thank you very much for this page, this is very helpful!<br /><br />I used the SPSS script to calculate the CIs for eta squared in a MANOVA.<br /><br />However, in some cases, mostly for the main effects in the MANOVA, I obtained an eta squared that was not covered by the CI: For instance I had F (34, 508) = 1.72, partial η2 =.103, 90% CI = [.012; .086]. <br /><br />Is it possible that the multivariate design causes the problem here? And would you have any suggestions on how to fix this?<br /><br />Thanks a lot and best regards,<br />TabeaTabeahttps://www.blogger.com/profile/05619778141992487767noreply@blogger.com