Comments on The 20% Statistician: The probability of p-values as a function of the statistical power of a test

2021-09-17T20:14:01.583+02:00

This comment has been removed by a blog administrator.

The blog contain valuable information thanks for s...

2016-11-05T06:20:08.061+01:00

The blog contain valuable information thanks for sharing it
Selenium Training in Chennai

This was not created in R - you can do the calcula...

2015-06-01T14:37:46.115+02:00

This was not created in R - you can do the calculations yourself here: http://rpsychologist.com/d3/pdist/

Hi, Thanx fort this great post. Could you provide ...

2015-06-01T14:35:49.975+02:00

Hi,
Thanx fort this great post. Could you provide any R script to reproduce those figures?

The "first principles" is the stuff that...

2014-06-04T16:38:32.947+02:00

The "first principles" is the stuff that allows us to predict whether the painting will still hang tomorrow. That's why I consider them essential.

The material on my blog is very raw in terms of time invested into revision (and a bit too python-heavy), so I don't blame you if it didn't catch you interest :) Though it often features re-analysis of real data-set so I would consider it applied. The blog is useful for me in putting down the ideas that I may wish to organize, simplify and publish at some later point. That's also my motivation for engaging in discussions - it helps to formulate ideas and see what my position is and how consistent it is. I don't think I will convince anyone, nor that many people read this.

To be clear, I hope you will have a huge influence...

2014-06-04T08:15:33.466+02:00

To be clear, I hope you will have a huge influence on how people do statistics. And I don't think the 1000 hits on my blog will make such of a difference. I'm just trying to explain what I think is important to consider if you want to teach people better statistics.

Hi Matus, your example nicely illustrates how unim...

2014-06-04T07:45:59.051+02:00

Hi Matus, your example nicely illustrates how unimportant statistics is, in the life of a researcher (and rightly so). Researchers should first spend (years of) their time understanding the literature, and make theoretically guided predicitions. Then, they need to learn how to create questionnaires, program experiments, learn to collect physiological data, and other skills necessary to collect data. Only then do they need to have sufficient understanding of statistics to draw some reasonable inference, while understanding that at the very best their single study is imput for a future meta-analysis.

Given this situation, researchers are mainly interesting in statistics as an applied science. I really don't care about 'first principles' - I want a decent hammer to hit a nail. It should have a good handle and get the job done. Sometimes if I don't have time, I'll use another tool to hammer in a nail. At the end of the day, as long as my paintings don't fall down, I'm happy.

If you want to be a theoretical statistician, then you don't need to worry about researchers. If you want your work to be applied, you need to understand how researchers work. I read some of your blogs, and I will never understand them in my life. I could, if I had the time, but I don't have the time, and the returns are too slim to warrant the time investment. My solutions take 3 minutes to understand, and make the tool people are already using better. If you think you can do better, give it a try, and let history decide who had a bigger influence.

Regarding the pragmatism: I agree that this is par...

2014-06-03T12:20:31.420+02:00

Regarding the pragmatism: I agree that this is partly a problem of stats people - some of whom have rare experience with actual psychological data. But partly this is because the problem with psychology lies with inapproriate use of hypothesis testing and poor experiment design. If your experiment and data are rubish no amount of statistics will save them. Then it is of no surprise that statisticians dont want to help you with your data.

As an example consider the latest example of psychologists "chasing noise" on the Gelman channel: http://tinyurl.com/lca27s2

Gigerenzer argues that bayesian stats is more intu...

2014-06-03T12:07:06.543+02:00

Gigerenzer argues that bayesian stats is more intuitive and therefore appealing while frequentist stats represents the objective scientific standard. From the conflict of these two arises the hybrid approach. Your attempt is a hybrid approach - it attempts to combine the intuitive appeal of bayes with objectivity of frequentists.
Why I think this won't work? I have no proof. But as Gigerenzer's work illustrates it failed to work many times before.

I'm positive towards work that attempts to bridge bayes and frequentis stats e.g. berger's likelihood principle or jaynes/gelman's objective bayes. But such work needs to build from first principles. Lot of work however offers only ad-hoc patches to some accute problems. This is the hybrid approach. These hacks are only local and lead to inconsistencies and work for only for some data/exp designs. I perceive your proposal as such hack. I prefer to stick with orthodox p-values or with bayes, or some intermediate approach that builds from a sound fundament.

Matus, I've read it, but I don't see the r...

2014-06-03T07:51:21.332+02:00

Matus, I've read it, but I don't see the relevance to my blog post. My blog post is work in progress to fix some of the issues described in the Gigerenzer paper (and in many papers before and after). If you can be more specific about what you mean, let me know, but I think you're wrong.

Hi, thanks for the reference - I'll read it. A...

2014-06-02T21:30:55.977+02:00

Hi, thanks for the reference - I'll read it. As I said - my main focus is making things a little better. Although I agree with most what you said, but I'm a very pragmatic guy. See this post on my old blog for an criticism for statisticians who are not pragmatic: https://sites.google.com/site/lakens2/blog/thelimitedpracticalsignificanceofstatisticians

sorry, that's this one: Gigerenzer, G. (1993)...

2014-06-02T20:40:41.099+02:00

sorry, that's this one:

Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. A handbook for data analysis in the behavioral sciences: Methodological issues, 311-339.

Hi Daniel, I'm not sure how one would test th...

2014-06-02T20:35:02.222+02:00

Hi Daniel,

I'm not sure how one would test the performance of the various methods. Quality of the post-hoc estimate depends on what kind of estimator you use. I didn't exress myself precisely. When I wrote "post-hoc estimate of the effect size" I meant the sample average which is biased. (This follows from basic sampling theory.) Then you can consider the performance of the various frequentist estimators and the various priors. Ultimately, some priors/estimator will perform better under certain conditions. Unfortunately, when working with real data it is difficult to say precisely which conditions are given and which prior/estimator should be used. In this respect all data analysis is subjective. Bayesians would probably argue that bayesian analysis manages to make at least this subjective element explicit.

As to what method are psych researchers mostly likely to accept and report in their paper this is easy to say. Three conditions need to be fullfilled: 1) the researchers need to be told by editors to use the method. 2) the method is implemented in SPSS 3) the method does not require any thinking - it should have the usual SPSS workflow - load data, select a method from the menu, click through the method's wizard to obtain the output and copy some values from the output into the manuscript. It should be clear that this has nothing to do with statistics or data analysis and hence I tend to ignore researcher's taste - at least when we talk about psychologists who lack the math & comp sci background that is required to understand and appreciate the methods they use.

This kind of work - balancing between the mathematical requirements and the intuitive appeal and between the frequentist and bayesian approach, has a long history in psych research but in certain sense it always leads to more trouble. If youre familiar with gigerenzer's 89 paper on ego, super-ego and id in statistics - your attempt looks to me like another chapter in this psychoanalytic saga.

It's true that this is a roundabout way of app...

2014-06-01T20:21:43.841+02:00

It's true that this is a roundabout way of approximating Bayesian model comparison through p-value. However, I think this post does a good job of illustrating the relationship between p-value and Bayes factors for those who are more comfortable with the frequentist framework. It shows how easy it is to set a prior on an effect size by making the judgment as to whether you're powered at 50% or 99%.

This post also does a good job of pointing out that p=.04 at large N is really not very informative at all! Whether you assume low power or high power, you're never much more than 2 : 1 odds one way or the other. Felix Schonbrodt makes a similar point with his Shiny app here: http://www.nicebread.de/interactive-exploration-of-a-priors-impact/

Always glad to read your stuff, Daniel.

Hi Matus, is there any work on a direct comparison...

2014-05-31T07:42:03.970+02:00

Hi Matus, is there any work on a direct comparison between post-hoc estimates of effect sizes vs. priors? Which are easier to get right, and have less bias? That would be an interesting question. I'm very pragmatic when it comes to statistics - perhaps people should use Bayesian inference (if they can formulate a sensible alternative hypothesis, which is not always the case), but they don't. Many excellent researchers are doing their best to convince people to use Bayesian inferences. I'm more interested in how to ammend the worst problems in the use of p-values. The idea to bring p-values and Bayesian inference closer together is not a dead-end: it's a very practical and logical solution for some of the problems in the way people use statistics. The question is whether people would be more likely to adjust their significance level as a function of sample size, than that they will switch to Bayesian inferences. I think that's likely. Even if they don't, people need to understand and evaluate p-values in the literature, so this blog post seemed wortwhile.

This idea is a dead-end. The power calculations de...

2014-05-30T19:23:11.010+02:00

This idea is a dead-end. The power calculations depend on the effect size which is unknown. One approach is to use the post-hoc estimate of the effect size. In general this estimator over-estimates the magnitude of the effect size. So we need some kind of correction. In bayesian stats this is handled by a prior which locates most probability mass around zero. (Another approach is to use hierarchical prior.) Another option is to derive the effect size (distribution) from the literature. Or as an third option you derive the effect size distribution based on some domain/topic-specific theoretic considerations. In any case all of this has been attempted and is routinely done in bayesian literature, when researchers try to justify their priors. The question then is, shouldn't we use the bayesian approach right on, instead of attempting to patch p-values so that they emulate bayesian inference?

ok, yes, I see, p rep is simply a function of p, s...

2014-05-30T16:48:50.626+02:00

ok, yes, I see, p rep is simply a function of p, so had no additional information. i misremembered because i did remember that the motivation of p rep was to indicate a sense of how likely the same found effect would be replicated if it indeed was the true effect. but that does not take into account power and simply takes the empirical effect as a face value estimate for the true effect, so it is useless for the present purpose. yes.
but the neyman pearson, i am sure operates along these lines - i never read an original, but i was taught inferential statistics in that way by willi hager. i believe he has only german literature on it, and it's mainly books. but heres two online articles: http://www.dgps.de/fachgruppen/methoden/mpr-online/issue9/art1/hager.pdf and http://www.dgps.de/fachgruppen/methoden/mpr-online/issue11/art2/article.html.

Hi Johann, is there work on this? I don't thin...

2014-05-30T16:25:02.340+02:00

Hi Johann, is there work on this? I don't think the p-rep value did this (it was directly related to the p-value), but just out of interest, I'd like to read up on p-values for alternative hypotheses (or p*) as you describe,

i also think that the same idea was at the heart o...

2014-05-30T16:13:36.863+02:00

i also think that the same idea was at the heart of the p rep measure that was popularized and then quickly abandonned a few years ago.

in addition to reporting simply a p value, also re...

2014-05-30T16:07:13.385+02:00

in addition to reporting simply a p value, also report a p* value which indicates the likelihood of observing a test statistic of the magnitude found under the alternative hypothesis (to be specified in advance). if the study is underpowered, power will be low and hence the likelihood of observing a t value (for example) with p=.049 is not very much larger than the likelihood of observing that same t/p value pair under H1. You may have found a significant effect, but with much uncertainty as to which distribution that value comes from). p/p* will be closer to zero in a study with higher power however. if the study is adequately powered for the specified effect, then p* will exceed p to a higher degree, and p/p* will become smaller.

if, however, the pre-specified effect is very large, then a test statistic with p = .049 - as you point out - might still be relatively more likely under the null than under the alternative hypothesis with specified huge effect size. then p/p* will become be > 1 and the result - albeit associated with a small p value and huge power - still speaks in favor of the null rather than the alternative, but with much uncertainty. hence, the p value should be much smaller (and with that, the likelihood of the test statistic occurring under the alternative hypothesis becomes larger again), so that p/p* again decreases.

ideally p/p* should be zero (no chance of a significant finding being a false positive, but every chance to find the true effect).

i believe that this is, however, the same idea as that behind using the bayes factor instead of simple p values that are based on the assumption that the null is true. it is also represented in classical neyman pearson testing, where you are not to only look at the p value, but also specify a target effect size beforehand and after results are in, inspect the found effect size for consistency with the target effect size you planned with (not necessarily whether it is larger or smaller, but whether it is in the vicinity of the planned effect). only if the empirical effect is approximately the size of the effect size planned with, the p value is a strong indication of the alternative being true. if the found effect is smaller than the one planned or much larger, then alpha and beta errors are out of control and either not much can be said anymore about what the study tells you (e.g., p value is very small, but p* is much smaller than p, but on the left side of the noncentral distribution - with a much larger planned effect) or you are running the risk that the found statistic - even though its likelihood / p value under the null is quite small -, the likelihood under the alternative is also small (this time on the right side of the noncentral distribution). then the beta error is also large and power quite low, so that finding something significant is arguably a lucky coincidence much more than a stable phenomenon you would bet a lot of money to find again with a similar small sample.

Hi Tony, thanks, that's a great comment! Yes, ...

2014-05-30T15:52:58.373+02:00

Hi Tony, thanks, that's a great comment! Yes, this is directly related to why none of the original studies with p-values between .2 and .5 replicated (so far) in the Reproducibility Project. Although, absolutely none replicating is still a little unexpected, and can't just be accounted for by p-value distributions, but that's another story. Replicating high p-values is an excellent way to guarantee the robustness of your work. There have been many suggestions to lower significance levels (e.g., to .005, by Johnson, 2013), but I think it's difficult to set a single standard for all research areas. Nevertheless, we should be slightly more crictical about conclusions about the alternative hypothesis after a single p = .04 (and we can be a little (but not too much!) less critical about a single lower p-value.

Here's a naive thought. Given what we have lea...

2014-05-30T15:21:07.568+02:00

Here's a naive thought. Given what we have learned from the reproducibility project that original findings with p values betw .02 and .05 replicated extremely rarely, would it be crazy to move away from .05 to a value such as .01?

I've begun to do this myself in my own work (e.g., I recently observed an anticipated effect with a p value of .04; my next step will be to try to replicate this effect, and make some procedural changes to hopefully make it larger, before thinking about trying to publish it).

Of course, this naive proposal assumes no p-hacking (otherwise, there would be a rash of findings just below p = .01 that would not replicate).