tag:blogger.com,1999:blog-987850932434001559.comments2017-07-23T11:35:23.583-07:00The 20% StatisticianDaniel Lakensnoreply@blogger.comBlogger877125tag:blogger.com,1999:blog-987850932434001559.post-42444099083503821372017-07-23T11:35:23.583-07:002017-07-23T11:35:23.583-07:00Another crew with some pretty big science reform h...Another crew with some pretty big science reform heavy hitters (Nosek, Wagenmakers, Ioannidis, Vazire, and many others) is now recommending .005 for most exploratory analyses. https://osf.io/preprints/psyarxiv/mky9j/<br /><br />I saw that you tweeted on this, but this post focuses exclusively on .001. I am returning to the .005 proposal in this comment.<br /><br />This is obviously considerably less severe of a tradeoff than .001. Also, in the U.S., your driving metaphor makes things ... interesting. People routinely drive 5-10mph over the speed limit. And the cops will rarely ticket you unless you are at least 10mph over. So, if you want to keep people's speeds, say, under 35mph, you would post a speed limit of 25mph. <br /><br />Metaphorically, then, given the evidence that many thing is pretty widespread of scientific "speeding" (or, to use Schimmack's term, "doping"), of phacking, garden of forking paths, suboptimal statistics, one could then plausibly argue that a lower "speed limit" (lower pvalue, .005 being proposed by this new crew), is necessary to keep the "true" speed down to the more reasonable .01 or .05.<br /><br />This is an argument from scientific human behavior, not stats or methods per se; or, at least, it is at the intersection of the psychology of scientific behavior and stats/methods. It is surely useful to know what is the ideal stat solution, if any, but proposing solutions that also address the frailty of scientists' behaviors may not be ridiculous and just might be even more productive. <br /><br />You are more stat-oriented than I am, so I am guessing you will stick to the stats. I am more of a social psychologist than stats guy (I teach grad stats, have pubbed some fairly sophisticated analyses, latent variable modeling, bayesian analysis, etc., and have a paper or two on how data is routinely misinterpreted, but I am not hardcore stats guy). Anyway, on balance, it seems to me that reform that attempts to address the psychology of stats use and interpretation has some merit.<br /><br />What do you think?<br /><br />(I think I signed up as "I'dratherbeplayingtennis," but I played already today, so)...<br /><br />Best,<br /><br />Lee Jussim<br />I'dratherbeplayingtennishttps://www.blogger.com/profile/15172903475376032782noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-81537956276929040262017-07-18T00:11:48.848-07:002017-07-18T00:11:48.848-07:00Great stuff Danny.Helpful topics for my notes. Tha...Great stuff Danny.Helpful topics for my notes. Thank youcheap transcription rateshttp://clickfortranscription.com/cheap-transcription-services.phpnoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-62080966976765755132017-07-17T17:41:12.334-07:002017-07-17T17:41:12.334-07:00The Nature link is down for maintenance; sci-hub i...The Nature link is down for maintenance; sci-hub is up. There's some kind of lesson. With a huge effect size.<br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-56222232585569950642017-07-17T06:29:33.093-07:002017-07-17T06:29:33.093-07:00I have never use Jamovi as a statistical tool but ...I have never use Jamovi as a statistical tool but after reading this article on Equivalence testing in jamovi I have learned a lot and I will be downloading the program so that I can learn how to use it especially during my free time when I have finish offering <a href="https://www.literaturereviewhelp.com/9-custom-writing/739-experts-lit-review-help" rel="nofollow">Critical Literature Review Writing Help</a> to college and university students who are writing their final year projects. Albert Smithhttps://www.blogger.com/profile/03762693089912796201noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-30785927034794069172017-07-12T05:13:58.794-07:002017-07-12T05:13:58.794-07:00You suggest changing
(1) (F - 1)/(F + (df_error...You suggest changing <br />(1) (F - 1)/(F + (df_error + 1)/df_effect))<br />into<br />(2) (F - 1)/(F + N/df_effect - 1)<br /><br />Although your formula is not incorrect, Daniel's isn't either. To be more precise: both are equivalent.<br /><br />The difference between (1) and (2) lies in <br />(1) (df_error + 1)/df_effect)<br />and<br />(2) N/df_effect - 1<br /><br />In the designs studied in this blog, N = df_total + 1 = df_effect + df_error + 1.<br />Thus,<br />N/df_effect - 1 <br /> = df_effect/df_effect + (df_error + 1)/df_effect - 1<br /> = 1 + (df_error + 1)/df_effect - 1<br /> = (df_error + 1)/df_effect<br /><br />Thus, your solution coincides with Daniel's.<br />Casper Albershttps://www.blogger.com/profile/05364304504311348392noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-428943820547088182017-07-05T01:16:24.543-07:002017-07-05T01:16:24.543-07:00I agree; that seems to be the explanation that the...I agree; that seems to be the explanation that the Glöckner article (cited in the OP) considers as well, and finds evidence for via simulation.Rjanaghttps://www.blogger.com/profile/16729914660093729850noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-59753559171125004312017-07-03T16:32:24.372-07:002017-07-03T16:32:24.372-07:00I think there is still a typo (or words missing) i...I think there is still a typo (or words missing) in that passage: "Simonsohn (2015) suggested to set the smallest effect size of interest to 33% of the effect size in the original study could detect."Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-5774682184441213432017-07-03T15:00:35.747-07:002017-07-03T15:00:35.747-07:00Why spoze it's hunger? and why spoze hunger is...Why spoze it's hunger? and why spoze hunger is a psychological effect? I don't know a thing about how common this alleged pattern is, nor have I read the source, but I doubt the order of cases is randomly selected. If I grade 2 or 3 sections of a class, I start with those most likely to be best/easiest. Deborah Mayohttps://www.blogger.com/profile/06527423269272136310noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-47616544677762472352017-07-03T08:43:31.407-07:002017-07-03T08:43:31.407-07:00Your comment "our society would have organize...Your comment "our society would have organized itself around this incredibly strong effect" is important here. A lot of (social) psychology seems to be about chasing effects that are invisible to the naked eye, and were unknown to Plato or Shakespeare, yet apparently emerge the size of an elephant with a sufficiently clever protocol. But hey, when a Nobel Prize winner writes that "The results are not made up, nor are they statistical flukes. You have no choice but to accept that the major conclusions of these studies are true. More important, you must accept that they are true about you", who are ordinary mortals to deny the existence of the elephant?<br /><br />As for how this result came about: Stephen Senn had a guest post on Deborah Mayo's blog this last weekend (https://errorstatistics.com/2017/07/01/s-senn-fishing-for-fakes-with-fisher-guest-post/) from which I learned this phrase: "every statistician should always ask ‘how did I get to see what I see?’". It's very tempting to view all data as equal --- R or SPSS doesn't care where the numbers come from --- but it's very dangerous.Nick Brownhttps://www.blogger.com/profile/18266307287741345798noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-86433696831611133202017-06-22T23:45:52.438-07:002017-06-22T23:45:52.438-07:00(1) Effect size is useful irrespective of whether ...(1) Effect size is useful irrespective of whether the study is experimental or not as it applies to the result and not the methodology of research. (2) Effect size is useful for accepting positive results. (3) I do not have experience to comment on its role in negative results. Post-hoc power analysis of negative result usually produces very low power when <br />sample size is modest ( note that I did nor say small). In our genetics case-control study we found negative result with a sample of 100 per group. Result may not change even if we repeat it with say 1000 cases. But how do we establish this statistically. In another project, we found negative result with first 30 samples but positive result after analyzing 100 samples. Sharath Bhttps://www.blogger.com/profile/06026361535400425559noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-8150639474334191522017-06-22T07:35:31.663-07:002017-06-22T07:35:31.663-07:00Dear Daniel, Thank you very much for your effort h...Dear Daniel, Thank you very much for your effort here, a very constructive post. Two quick things. <br /><br />First, when something is really unknown, one probably would prefer to run a "door-to-door" search to find it using some initial clue (Bayesian Inference) rather then probably take a null position and wait for some null-falsifying evidence to reject that null position (Frequentist Inference).<br /><br /><br />Second, inference is important **only after** correct probability modeling. A HUGE share of social and behavioral research uses measurement tools that are either dichotomousely scored or on a Likert scale. Such research findings must be only stochatistcally modeled accurately using discrete probability modeling (e.g., negative binomial, hypergeometric) taking into account possible over-dispersion almost always present in such type of research data.<br /><br />I think **After** we really accurately model an actual research using an accurate probability model, the issue of inference **reasonably** just starts.<br /><br />I very much look forward to a day when two things in social and behavioral sciences happen. (A) we don't use t-tests and (M)AN(C)OVAs and LMs when really the measurement tools we see in social & behavioral research cry out loud for Generalized Linear Models, and Discrete Probability Modeling. (B) Efforts to make an inference happen only after (A) is met.Annynomoushttps://www.blogger.com/profile/06559730075316700000noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-40742720035023779792017-06-21T10:51:21.465-07:002017-06-21T10:51:21.465-07:00I hope you are correct and that equivalence testin...I hope you are correct and that equivalence testing gains popularity. I fear that most practicing scientists have too strong an incentive to continue with "nil hypothesis" testing - it is easy to do, requires almost no understanding of what is actually being done, and it substantially increases the chances of getting a paper published. I appreciate your work in pushing for a much more philosophically sound alternative.Benhttps://www.blogger.com/profile/08481083767625560025noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-72197814422263225852017-06-20T21:54:38.189-07:002017-06-20T21:54:38.189-07:00It will become much easier, and we will see more, ...It will become much easier, and we will see more, now people are starting to use equivalence testing: http://journals.sagepub.com/doi/full/10.1177/1948550617697177Daniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-46319533738897664662017-06-20T21:53:30.762-07:002017-06-20T21:53:30.762-07:00There is a difference between accepting model assu...There is a difference between accepting model assumptions, and including belief in your model. You can believe there is a truth out there - but since your belief is not relevant for it, scientific realism suggests there is no rationale to include it in a statistical test. Daniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-14669330818540044642017-06-20T15:51:57.369-07:002017-06-20T15:51:57.369-07:00Regarding Meehl, you write:
"Meehl believes ...Regarding Meehl, you write:<br /><br />"Meehl believes accepting or rejecting predictions is a sound procedure, as long as you test risky predictions in procedures with low error rates"<br /><br />I agree, but I also take Meehl's position as meaning that nearly all "significant" results are useless, given sufficient power. The error rates will be low but the results will (perhaps ironically) tell you less and less the more power you have. From the abstract to "Theory Testing in Psychology" (1967):<br /><br />"Because physical theories typically predict numerical values, an improvement in experimental precision reduces the tolerance range and hence increases corroborability. In most psychological research, improved power of a statistical design leads to a prior probability approaching 1/2 of finding a significant difference in the theoretically predicted direction. Hence the corroboration yielded by "success" is very weak, and becomes weaker with increased precision. "Statistical significance" plays a logical role in psychology precisely the reverse of its role in physics..."<br /><br />So yes, Meehl would agree with the goal of error control, but I read this above quote as saying that you can't get error control AND the testing of risky predictions using a procedure that attempts to reject a special case of "not the hypothesis" instead of attempting to directly reject the hypothesis. Do you see many cases of NHST being used to test risky predictions, in which "reject Ho" means "reject my scientific hypothesis"?Ben Prytherchnoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-3011550964593040932017-06-20T15:51:23.089-07:002017-06-20T15:51:23.089-07:00Hi Daniel, I enjoy your blog and I appreciate you ...Hi Daniel, I enjoy your blog and I appreciate you emphasizing the importance of philosophy in evaluating statistical inferences. You state that:<br /><br />"From a scientific realism perspective, Bayes Factors or Bayesian posteriors do not provide an answer to the main question of interest, which is the verisimilitude of scientific theories."<br /><br />I'm sure you've heard the similar Bayesian critique of frequentist methods, which is that p-values and decisions about statistical significance don't answer the question we are usually interested in. From talking to my non-statistician friends about how they interpret statistical results, I've found that they all want the p-value to be the probability that their results were due to chance, so that they can interpret a small p-value as the probability their research hypothesis is incorrect. This was Cohen's critique in "The Earth is Round (P<0.05)":<br /><br />"What's wrong with NHST? Well, among many other things it does not tell us what we want to know, and we so much want to know what we want to know that, out of desperation, we nevertheless believe that it does!"<br /><br />I've found that my students in introductory statistics also instinctively want to interpret the p-value as the probability of the null. This could be because they are just being introduced to NHST and the logic is somewhat convoluted and so they initially go with the simpler (and incorrect) interpretation of statistical significance. I suspect that it is also because the incorrect interpretation of statistical significance makes the most intuitive sense, and answers the question that is of most interest to them.<br /><br />Of course, the clever students eventually learn the model, and understand the logic of rules such as "we treat population parameters as having fixed but unknown values, and so therefore we cannot make probabilistic statements about these values. It is only our data that are random, not the truth." But usually learning this is a struggle.<br /><br />I know you qualified your statement with "from a scientific realism perspective" - does treating probability as epistemological rather than ontological mean having rule out or suspend scientific realism? It seems to me you can both treat probability as referring to a state of knowledge *and* believe that there is a truth out there that is ultimately beyond our reach, even as we constantly strive to improve our understanding of it. I don't see the conflict here. For example I'm allowed to put a "normally distributed random error" term in a model even though I know that what I'm treating as "error" is really governed, at least in part, by other deterministic forces. In this sense, "normal random error" is a substitute for uncertainty; I know that I can't model everything and make perfect predictions and so I'm going to pretend that "normal random error" explains all of the observed variation that my model fails to predict. It's certainly fine to call this a frequency. It's also fine to call it a model of uncertainty, without having to give up on objective reality.<br />Ben Prytherchnoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-3596006042877528862017-06-20T05:27:39.458-07:002017-06-20T05:27:39.458-07:00Very nice blog! I have a question!
I have two inde...Very nice blog! I have a question!<br />I have two independent groups. I have looked at the means of these of two groups and ran t-tests to detect significant differences between them, but there were not any to be found. I have not done anything experimental, it is an observational/comparing/cross sectional (don't know what to call it) study. Now I am asked to run a post hoc power analysis (power analysis wasnt done before because of a new field and lack of data) to see if it was even possible for me to detect any reasonable differences with my number of observations?<br />Does this make sense? How could I do this? Is effect size even necessary in studies that are not experimental?<br />/Thom, frustrated bsc-studentUnknownhttps://www.blogger.com/profile/04624870937595191453noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-91120498397081282622017-06-20T02:30:56.977-07:002017-06-20T02:30:56.977-07:00It's a pleasure to read these posts where the ...It's a pleasure to read these posts where the contrast of methods and philosophy of science is underscored. The Meehl objection to 'NHST everywhere' in psychology is a weak version to that of Gelman (no such things as 'null effect' or 'null HP', why are you testing against it?) and very similar to that of Gigerenzer in one of his recent talks (https://www.youtube.com/watch?v=4VSqfRnxvV8&t=1910s): NHST is perfectly OK and may add a lot to the theory, as long as you are pitting two proper alternative explanations against each other (his examples relates to the use of heuristics in accurate decision-making: instead of pitting heuristic A against H0, you should pit heuristic A against heuristic B and check which is more accurate). This gives incremental theoretical value to statistically significant results.<br />My position here is this: I agree with Meehl and Gigerenzer (not with Gelman). But, Feyerabend makes an extreme point which we should be mindful of: there is no 'one method' to do science, and thus I'll remain open to NHST against 'pure H0', while maybe asking for a higher burden of proof there than I would in NHST 'explanation 1 vs explanation 2'.Ignazio Zianonoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-82602027273249801792017-06-19T13:13:23.512-07:002017-06-19T13:13:23.512-07:00I typically don't reply to anonymous comments....I typically don't reply to anonymous comments.Daniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-59397620070966338922017-06-19T13:00:38.415-07:002017-06-19T13:00:38.415-07:00It seems quite a stretch to note that Meehl accept...It seems quite a stretch to note that Meehl accepted N-P type testing under certain conditions and then go on to argue that his writings support the idea that, "error control is the most important goal in science."Unknownhttps://www.blogger.com/profile/00227235335343168838noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-10139953143078579962017-06-19T11:59:36.027-07:002017-06-19T11:59:36.027-07:00No, belief and truth-likeness are not the same. No...No, belief and truth-likeness are not the same. Note that the problem is not the relative likelihood (likelihoods are fine and can be used) the problem is the prior. Daniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-57395344448303485902017-06-19T11:58:35.453-07:002017-06-19T11:58:35.453-07:00Hi Robert, thanks for your comments (even though I...Hi Robert, thanks for your comments (even though I'm pretty sure I didn't understand the second paragraph, but I'll google). I guess you are right that if outcome of the Frequentist and Bayesian decision procedure are the same, there is only a philosophical difference, but not one in practice. I think Bayesian updating can be used combined with a decision threshold as long as the frequentist error rates are ok (If I understand your main point!).Daniel Lakenshttps://www.blogger.com/profile/18143834258497875354noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-70696809389484111912017-06-19T11:02:51.394-07:002017-06-19T11:02:51.394-07:00This is a well-written, dense blog post. It seems ...This is a well-written, dense blog post. It seems to be a quite concise summary of your position. Thanks for writing it.<br /><br />Well, you read van Fraassen and Feyerabend and still belief in scientific realism. So no need to recapitulate their arguments, i guess. If you want more food for thought though, maybe try Adornos Negative Dialectics for a very dense text on incommensurability. <br /><br />One of your other points is whether Bayesian posteriors can map the verisimilitude of scientific theories. This is an intriguing question. I'd argue that if reality exists in a verisimilitude fashion, then only as Dirac or Kronecker delta functions. Consider that it is questionable whether any prior (but the oracle prior) can ever converge to such a function in finite time, or finite iterations of experiments. Even more so if we assume that the delta function is non-stationary, or if the objective scientific experiment generating the evidence is non-reproducible (e.g. prediction of an election result, or similar). Therefore it could be there is a set of statements about reality, which might never be captured by Bayesian updating. In that regard, i fully agree with you that it needs a jump of faith for verisimilitude, maybe using thresholding at which point we treat a belief function as a delta function. But there exist many ways how this could be incorporated.<br /><br />Consider that even hard Bayesians would accept that Trump won the election as inevitable fact, i.e. their posterior is 1 on Trump and 0 on Hillary. So i might not really understand your line of reasoning against Bayesian updating here. Hm. Maybe you are more wondering whether a Bayesian may use thresholding also for probabilistic statements, for which we could still perform reproducable experiments to gain further evidence? Robert Bauerhttps://www.blogger.com/profile/11478797098322656213noreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-22587740044365149172017-06-19T04:27:06.462-07:002017-06-19T04:27:06.462-07:00But I'm still puzzled: even in 1935, is it obv...But I'm still puzzled: even in 1935, is it obvious that Wald had had the time to read the Neyman-Pearson's papers? And if he had had the time, why doesn't Popper quote them in Logik der Forschung? Perhaps more importantly, I'm unsure whether the modification suggested by Wald is really fundamental; and if it isn't, we migth think that Popper had already built up his own ideas independantly of Neyman-Pearson ;) (it's just a detail, I'll admit it!)Aurélien Allardnoreply@blogger.comtag:blogger.com,1999:blog-987850932434001559.post-4846666591069102072017-06-19T03:28:22.677-07:002017-06-19T03:28:22.677-07:00"From a scientific realism perspective, Bayes..."From a scientific realism perspective, Bayes Factors or Bayesian posteriors do not provide an answer to the main question of interest, which is the verisimilitude of scientific theories. Belief can be used to decide which questions to examine, but it can not be used to determine the truth-likeness of a theory."<br /><br />If Bayes factors tell you the plausibility of one hypothesis over another then doesn't that also imply that they tell you something about the truthlikeness or verisimilitude of the hypothesis, relative to the other (i.e., the one with greater plausibility is closer to the truth based on the observable data)?farid1323https://www.blogger.com/profile/14777633298107288619noreply@blogger.com