Comments on The 20% Statistician: Verisimilitude, Belief, and Progress in Psychological Science

2021-02-18T07:51:03.210+01:00

This comment has been removed by a blog administrator.

The tail of the test follows from the theoretical ...

2017-11-15T08:09:22.611+01:00

The tail of the test follows from the theoretical preduction you are testing. It is unrelated to belief (you can test a hypothesis you don't believe).

Hi Daniel, Great and thoughtful post. Just a quick...

2017-09-25T15:08:15.035+02:00

Hi Daniel,
Great and thoughtful post. Just a quick question. (I know I'm three months "late"). You say "since your belief is not relevant for it, scientific realism suggests there is no rationale to include it in a statistical test." But isn't it true that belief enters the statistical test if you choose between one- and two-tailed tests, both of which make perfect sense from the perspective of error-control (but not from a p-value-as-evidence perspective)? Or, to give another example, when you determine things like regions-of-practical-equivalence (rope) (or whatever you may want to call them), where you somehow have to believe that the end-points are practically equivalent to some null/ roped value? Just wondering.

Being an empirical scientist doesn't necessari...

2017-08-08T11:11:39.532+02:00

Being an empirical scientist doesn't necessarily determine which "-ist" we can be. The aim of science seems to "decide which features are present in our world", but this is already a philosophical (realistic) statement. I believe we need to forget some of the "common sense" of science temporarily, or avoid philosophical bias just like avoiding selection bias, to think about philosophy of science.

But anyway, it's a great post!

2017-07-18T09:11:48.848+02:00

This comment has been removed by a blog administrator.

Dear Daniel, Thank you very much for your effort h...

2017-06-22T16:35:31.663+02:00

Dear Daniel, Thank you very much for your effort here, a very constructive post. Two quick things.

First, when something is really unknown, one probably would prefer to run a "door-to-door" search to find it using some initial clue (Bayesian Inference) rather then probably take a null position and wait for some null-falsifying evidence to reject that null position (Frequentist Inference).

Second, inference is important **only after** correct probability modeling. A HUGE share of social and behavioral research uses measurement tools that are either dichotomousely scored or on a Likert scale. Such research findings must be only stochatistcally modeled accurately using discrete probability modeling (e.g., negative binomial, hypergeometric) taking into account possible over-dispersion almost always present in such type of research data.

I think **After** we really accurately model an actual research using an accurate probability model, the issue of inference **reasonably** just starts.

I very much look forward to a day when two things in social and behavioral sciences happen. (A) we don't use t-tests and (M)AN(C)OVAs and LMs when really the measurement tools we see in social & behavioral research cry out loud for Generalized Linear Models, and Discrete Probability Modeling. (B) Efforts to make an inference happen only after (A) is met.

I hope you are correct and that equivalence testin...

2017-06-21T19:51:21.465+02:00

I hope you are correct and that equivalence testing gains popularity. I fear that most practicing scientists have too strong an incentive to continue with "nil hypothesis" testing - it is easy to do, requires almost no understanding of what is actually being done, and it substantially increases the chances of getting a paper published. I appreciate your work in pushing for a much more philosophically sound alternative.

It will become much easier, and we will see more, ...

2017-06-21T06:54:38.189+02:00

It will become much easier, and we will see more, now people are starting to use equivalence testing: http://journals.sagepub.com/doi/full/10.1177/1948550617697177

There is a difference between accepting model assu...

2017-06-21T06:53:30.762+02:00

There is a difference between accepting model assumptions, and including belief in your model. You can believe there is a truth out there - but since your belief is not relevant for it, scientific realism suggests there is no rationale to include it in a statistical test.

Regarding Meehl, you write: "Meehl believes ...

2017-06-21T00:51:57.369+02:00

Regarding Meehl, you write:

"Meehl believes accepting or rejecting predictions is a sound procedure, as long as you test risky predictions in procedures with low error rates"

I agree, but I also take Meehl's position as meaning that nearly all "significant" results are useless, given sufficient power. The error rates will be low but the results will (perhaps ironically) tell you less and less the more power you have. From the abstract to "Theory Testing in Psychology" (1967):

"Because physical theories typically predict numerical values, an improvement in experimental precision reduces the tolerance range and hence increases corroborability. In most psychological research, improved power of a statistical design leads to a prior probability approaching 1/2 of finding a significant difference in the theoretically predicted direction. Hence the corroboration yielded by "success" is very weak, and becomes weaker with increased precision. "Statistical significance" plays a logical role in psychology precisely the reverse of its role in physics..."

So yes, Meehl would agree with the goal of error control, but I read this above quote as saying that you can't get error control AND the testing of risky predictions using a procedure that attempts to reject a special case of "not the hypothesis" instead of attempting to directly reject the hypothesis. Do you see many cases of NHST being used to test risky predictions, in which "reject Ho" means "reject my scientific hypothesis"?

Hi Daniel, I enjoy your blog and I appreciate you ...

2017-06-21T00:51:23.089+02:00

Hi Daniel, I enjoy your blog and I appreciate you emphasizing the importance of philosophy in evaluating statistical inferences. You state that:

"From a scientific realism perspective, Bayes Factors or Bayesian posteriors do not provide an answer to the main question of interest, which is the verisimilitude of scientific theories."

I'm sure you've heard the similar Bayesian critique of frequentist methods, which is that p-values and decisions about statistical significance don't answer the question we are usually interested in. From talking to my non-statistician friends about how they interpret statistical results, I've found that they all want the p-value to be the probability that their results were due to chance, so that they can interpret a small p-value as the probability their research hypothesis is incorrect. This was Cohen's critique in "The Earth is Round (P<0.05)":

"What's wrong with NHST? Well, among many other things it does not tell us what we want to know, and we so much want to know what we want to know that, out of desperation, we nevertheless believe that it does!"

I've found that my students in introductory statistics also instinctively want to interpret the p-value as the probability of the null. This could be because they are just being introduced to NHST and the logic is somewhat convoluted and so they initially go with the simpler (and incorrect) interpretation of statistical significance. I suspect that it is also because the incorrect interpretation of statistical significance makes the most intuitive sense, and answers the question that is of most interest to them.

Of course, the clever students eventually learn the model, and understand the logic of rules such as "we treat population parameters as having fixed but unknown values, and so therefore we cannot make probabilistic statements about these values. It is only our data that are random, not the truth." But usually learning this is a struggle.

I know you qualified your statement with "from a scientific realism perspective" - does treating probability as epistemological rather than ontological mean having rule out or suspend scientific realism? It seems to me you can both treat probability as referring to a state of knowledge *and* believe that there is a truth out there that is ultimately beyond our reach, even as we constantly strive to improve our understanding of it. I don't see the conflict here. For example I'm allowed to put a "normally distributed random error" term in a model even though I know that what I'm treating as "error" is really governed, at least in part, by other deterministic forces. In this sense, "normal random error" is a substitute for uncertainty; I know that I can't model everything and make perfect predictions and so I'm going to pretend that "normal random error" explains all of the observed variation that my model fails to predict. It's certainly fine to call this a frequency. It's also fine to call it a model of uncertainty, without having to give up on objective reality.

It's a pleasure to read these posts where the ...

2017-06-20T11:30:56.977+02:00

It's a pleasure to read these posts where the contrast of methods and philosophy of science is underscored. The Meehl objection to 'NHST everywhere' in psychology is a weak version to that of Gelman (no such things as 'null effect' or 'null HP', why are you testing against it?) and very similar to that of Gigerenzer in one of his recent talks (https://www.youtube.com/watch?v=4VSqfRnxvV8&t=1910s): NHST is perfectly OK and may add a lot to the theory, as long as you are pitting two proper alternative explanations against each other (his examples relates to the use of heuristics in accurate decision-making: instead of pitting heuristic A against H0, you should pit heuristic A against heuristic B and check which is more accurate). This gives incremental theoretical value to statistically significant results.
My position here is this: I agree with Meehl and Gigerenzer (not with Gelman). But, Feyerabend makes an extreme point which we should be mindful of: there is no 'one method' to do science, and thus I'll remain open to NHST against 'pure H0', while maybe asking for a higher burden of proof there than I would in NHST 'explanation 1 vs explanation 2'.

I typically don't reply to anonymous comments....

2017-06-19T22:13:23.512+02:00

I typically don't reply to anonymous comments.

It seems quite a stretch to note that Meehl accept...

2017-06-19T22:00:38.415+02:00

It seems quite a stretch to note that Meehl accepted N-P type testing under certain conditions and then go on to argue that his writings support the idea that, "error control is the most important goal in science."

No, belief and truth-likeness are not the same. No...

2017-06-19T20:59:36.027+02:00

No, belief and truth-likeness are not the same. Note that the problem is not the relative likelihood (likelihoods are fine and can be used) the problem is the prior.

Hi Robert, thanks for your comments (even though I...

2017-06-19T20:58:35.453+02:00

Hi Robert, thanks for your comments (even though I'm pretty sure I didn't understand the second paragraph, but I'll google). I guess you are right that if outcome of the Frequentist and Bayesian decision procedure are the same, there is only a philosophical difference, but not one in practice. I think Bayesian updating can be used combined with a decision threshold as long as the frequentist error rates are ok (If I understand your main point!).

This is a well-written, dense blog post. It seems ...

2017-06-19T20:02:51.394+02:00

This is a well-written, dense blog post. It seems to be a quite concise summary of your position. Thanks for writing it.

Well, you read van Fraassen and Feyerabend and still belief in scientific realism. So no need to recapitulate their arguments, i guess. If you want more food for thought though, maybe try Adornos Negative Dialectics for a very dense text on incommensurability.

One of your other points is whether Bayesian posteriors can map the verisimilitude of scientific theories. This is an intriguing question. I'd argue that if reality exists in a verisimilitude fashion, then only as Dirac or Kronecker delta functions. Consider that it is questionable whether any prior (but the oracle prior) can ever converge to such a function in finite time, or finite iterations of experiments. Even more so if we assume that the delta function is non-stationary, or if the objective scientific experiment generating the evidence is non-reproducible (e.g. prediction of an election result, or similar). Therefore it could be there is a set of statements about reality, which might never be captured by Bayesian updating. In that regard, i fully agree with you that it needs a jump of faith for verisimilitude, maybe using thresholding at which point we treat a belief function as a delta function. But there exist many ways how this could be incorporated.

Consider that even hard Bayesians would accept that Trump won the election as inevitable fact, i.e. their posterior is 1 on Trump and 0 on Hillary. So i might not really understand your line of reasoning against Bayesian updating here. Hm. Maybe you are more wondering whether a Bayesian may use thresholding also for probabilistic statements, for which we could still perform reproducable experiments to gain further evidence?

But I'm still puzzled: even in 1935, is it obv...

2017-06-19T13:27:06.462+02:00

But I'm still puzzled: even in 1935, is it obvious that Wald had had the time to read the Neyman-Pearson's papers? And if he had had the time, why doesn't Popper quote them in Logik der Forschung? Perhaps more importantly, I'm unsure whether the modification suggested by Wald is really fundamental; and if it isn't, we migth think that Popper had already built up his own ideas independantly of Neyman-Pearson ;) (it's just a detail, I'll admit it!)

"From a scientific realism perspective, Bayes...

2017-06-19T12:28:22.677+02:00

"From a scientific realism perspective, Bayes Factors or Bayesian posteriors do not provide an answer to the main question of interest, which is the verisimilitude of scientific theories. Belief can be used to decide which questions to examine, but it can not be used to determine the truth-likeness of a theory."

If Bayes factors tell you the plausibility of one hypothesis over another then doesn't that also imply that they tell you something about the truthlikeness or verisimilitude of the hypothesis, relative to the other (i.e., the one with greater plausibility is closer to the truth based on the observable data)?

Maybe I should have added I draw heavily on the 3r...

2017-06-19T11:41:04.217+02:00

Maybe I should have added I draw heavily on the 3rd addendum in later editions of Poppers book. I cite the 2002 version intentionally.

But Popper didn't die in 1934, and in later (t...

2017-06-19T11:39:10.618+02:00

But Popper didn't die in 1934, and in later (translated and updated) additions, he added the following footnote indicating he talked to Wald:

"Here the word ‘all’ is, I now believe, mistaken, and should be replaced, to be a little more precise, by ‘all those . . . that might be used as gambling systems’. Abraham Wald showed me the need for this correction in 1935. Cf. footnotes *1 and *5 to section 58 above (and footnote 6, referring to A. Wald, in section *54 of my Postscript)"

If only falsifying hypotheses was so easy all the time ;)

Very nice post! I'll need more time to think a...

2017-06-19T11:35:09.524+02:00

Very nice post! I'll need more time to think about the substantive issues, but here's some nitpicking: I'm not sure that "this methodological falsification (Lakatos, 1978) is clearly inspired by a Neyman-Pearson perspective on statistical inferences." Logik der Forschung was published in 1934, while Neyman and Pearson's papers were published in the 1930's, I think (1933 for the paper you quote). Given the slow communication between Austria and Great-Britain at that time, I think it's more likely that they developped their thinking independantly of each other (I don't think Wald was already writing statistical papers at that time). But I'd be glad to be proved wrong!