In your online class you just said the opposite of this post A similar comment is made by Tackett and McShane (2018) in their comment on ZELD: “Specifically, large-scale replications are typically only possible when data collection is fast and not particularly costly, and thus they are, practically speaking, constrained to certain domains of psychology (e.g., cognitive and social).”"<br /><br />I always find this reasoning fascinating. To me, it doesn't make much sense, because these "hard to find" or "costly" participants are apparently only "hard to find" or "costly" for 1 study. 

After all, chances are high that a next study will use the same "hard to find" or "costly" particpants but now for a different study for which participants and money magically appear all of a sudden. "Do you think that would be useful information, and do you know if this has been done by someone already?"

I still think this might be very useful, also see my (as of yet unpublished) possibly directly related comment on this blogpost, and on this one here about a new format for performing and publishing psychological science: http://daniellakens.blogspot.nl/2018/

Sorry if i keep bothering you with the idea, but i haven't received any feedback on it, so i don't know if you think it might make (some) sense or not. 

Wouldn't it be funny if someone would give you ideas and thoughts for free, which could possibly be helpful in thinking about how to optimally perform research, just because you give them the opportunity to try and contribute by havind this blog? 
You wouldn't even need a fancy grant (worth how much money exactly?!) for any of this stuff! I reason:<br /><br />1) it seems to me that it is impossible to determine/gauge the percentage of hypotheses that (will) turn out to be "correct"/"proven".<br /><br />2) more importantly, it seems to me that a) it doesn't matter what this percentage is (within reasonable boundaries), and b) it is not even desirable to try and determine/gauge this percentage, because i reason both a) and b) are irrelevant for building knowledge. <br /><br />What matters in my reasoning is amassing things like optimally gathered (and thus maximaly informational) data, and arguments/reasoning, which can both be used for things like theory-building and -testing. <br /><br />I am all for "(...) scientists explicitly thinking about the utility of the research they perform", but i reason it might make more sense to think about this for all research, not just replications. In fact, i reason that it is more important for "original" research, because i predict that (nearly) all else will follow automatically once things are done more optimally from the start. 

The bottom line to me is, many of the things that might be wrong in psychological science seem to me to be connected, and based on a few things that need to be improved. I reason the rest of the problems will solve themselves automatically. 

Here is an idea which tries to solve some of the basic things that might be connected, and which i reason will also help solve other issues. I called it after what i think is a summarization of the few basic problems which i reason can all easily be solved: "Science is dependent on scientists (old flawed model?) V scientistst are dependent on Science (new improved model?)":

http://andrewgelman.com/2017/12/17/stranger-than-fiction/#comment-628652

I hope you will (also) focus on how to optimally perform research in general, and not just on replications. I reason when the former is done, the rest will follow automatically. Good luck/ all the best with your work on this important topic !! Nothing changes - if you don't assume equal variances, you do a one-sided Welch's t-test. Thanks for blog post Daniel :) Just wondering about the unequal variance issue? 
I am guessing that converting a two-way to one-way test, i would assume that a balanced set and equal variance will be assumed or not? Perhaps it would be easier for people to get into justifying replications in this way if they were also in the habit of similarly justifying why they run their initial studies. I'm not convinced that this is always the case. "Because we have a grad student who thinks this is interesting and a participant pool who have quotas to meet" does not necessarily meet this criterion, I would suggest. Yes, you can always calculate the effect size you could detect with a certain level of power. But, there is never information that goes beyond the p-value. So, knowing how sensitive your design was is always good info to have, but it is difficult to use it as a way to draw inferences from data. I think there are some forms of post-hoc power analyses that are appropriate.

I agree wholeheartedly with everything you have said. Calculating the power of the study from the sample size (and SD) the alpha and the *observed* power is completely useless post-hoc.

However, would it not be reasonable for the editor to ask for the following:

In cases where a power/sample size calculation has not been performed in the original paper (perhaps in cases where group sizes are determined by other factors), would it not be suitable for the editor to ask for a calculation of the *detectable* difference. i.e put into the power calculation the alpha, sample size, a pre-agreed beta value and see what size of difference the study would have been able to fix.

I understand that this method would closely align with confidence intervals. However, I think it will demonstrate under-powered studies with more impact. Particularly in non-inferiority trials that claim non-inferiority when that are massively under-powered. Possibly related to the previous post, and related to how to "optimally" perform research concerning resources, I thought of the following.

Even though I s#ck at statistics, given my previous post, I wondered if it would be possible to try and gauge what the optimal no. of 1) participants per study and/or researcher, and 2) direct replications could be.

Then I thought about all the "Registered Replication Reports", and the figures of all the separate labs with their associated no. of pp, confidence intervals, effect sizes, and p-values.

I wondered if it would be possible to use this information from all "Registered Replication Reports" performed thus far, to see how diagnostic it would be to randomly draw 3, or 4, or 5 etc. labs, and see how diagnostic/accurate their associated no. of pp, confidence intervals, effect sizes, and p-values are given the total results of all labs combined. 

This could be possibly be very interesting, and useful, information (it being "real" compared to possible additional simulations) concerning how to "optimally" perform research, and could perhaps also provide information concerning the format I described in the previous post, and what the "optimal" amount of "direct" replications could be for instance.

Do you think that would be useful information, and do you know if this has been done by someone already? I am not smart enough to do that.

Regardless: thank you for all your efforts in trying to help improve psychological science! Nice article! Thanks!! Since you (like I) are happy to assume a point null hypothesis, I can't understand why you don't frame the argument in terms of the false positive risk. The tail of the test follows from the theoretical preduction you are testing. It is unrelated to belief (you can test a hypothesis you don't believe). Nice post. I tried to make a similar argument by pointing out that many statistical analyses depend on exactly the same information from the data but can produce different answers because they ask different questions. Details at:
https://link.springer.com/article/10.3758%2Fs13428-016-0812-3 We should consider the context.<br /><br />>> But I think it’s a good idea to teach them about all the possible questions they can ask.<br /><br />Of course, this is a nice idea and hardly anyone would disagree. But how do we find all those possible questions and how and what should we teach students about them? My guess is that in the end the popular answer would be to provide introduction into (history of) philosophy of science and perhaps some (history of) epistemology, sprinkled with some insights from Meehl, Cohen and other gurus. I doubt this improves inferences since it does not provide a formal, transparent and replicable (and thus scientific) language for asking and discussing scientific questions and statements. <br /><br />As You possibly already learned from my other comments, I think such language already exists and should be taught. Bravo! Occasionally my brighter students manage to "come up for air" after being submerged in probability distributions, hypothesis tests, and modeling techniques. Then they ask "What's it all for?" and "How do I know when to use what?" and other embarrassing questions which most textbooks carefully avoid. Now I need to collect case studies that prompt my students to think about those "possible questions to ask." On point This is your best post so far Daniel. I look forward to the one directly following up your last sentence. As a Stats teacher, I don't tell my students what they want to know. I do, however, tell them what kind of questions they can (and can't) get answered whenever they apply a particular technique or model. As a consultant, I typically hear the person first (to know "what they want") and then I try to advice accordingly. It does happen that then I rephrase what one can achieve when using a specific method (kind of what you can conclude once you replace a parametric by a nonparametric ANOVA, or even better, once you ditch frequestism over bayesianism). I think it is a best advice to educate students to phrase their questions in such a way that a translation into a suitable statistical model becomes feasible (instead of just freely phrasing questions). I am unfamiliar with R and to save me learning it, I thought this might be useful for equivalence testing. Any thoughts? Many thanks in advance.Sue Raddhttps://www.blogger.com/profile/06377200583605331824noreply@blogger.com