Tuesday, November 18, 2014

Evaluating Estimation Accuracy With the Award Winning V-Statistic


Clintin Davis-Stober recently pointed me to his blog about the v-statistic. The v-statistic (nothing Greek, just the letter v) is a measure of how accurately your data is estimating the corresponding parameter values in the population. The paper he wrote with Jason Dana won the Clifford T. Morgan award (awarded to the best paper in each Psychonomic Society journal, this being from Behavior Research Methods). This makes v an award winning new statistic. I’m really happy this paper won an award for a number of reasons.

First of all, the v-statistic is not based on Frequentist of Bayesian statistics. It introduces a third perspective on accuracy. This is great, because I greatly dislike any type of polarized discussion, and especially the one between Frequentists and Bayesians. With a new kid on the block, perhaps people will start to acknowledge the value of multiple perspectives on statistical inferences.

Second, v is determined by the number of parameters you examine (p), the effect size R-squared (Rsq), and the sample size. To increase accuracy, you need to increase the sample size. But where other approaches, such as those based on the width of a confidence interval, lack a clear minimal value researchers should aim for, the v-statistics has a clear lower boundary to beat: 50% guessing average. You want a v>.5. It’s great to say people should think for themselves, and not blindly use numbers (significance levels of 0.05, 80% power, medium effect sizes of .5, Bayes Factors > 10) but let’s be honest: That’s not what the majority of researchers want. And whereas under certain circumstances the use of a p = .05 is rather silly, you can’t go wrong with using v > .5 as a minimum. Everyone is happy.

Third, Ellen Evers and I wrote about the v-statistic in our 2014 paper on improving the informational value of studies (Lakens & Evers, 2014), way before v won an award. It’s like discovering a really great band before it becomes popular. 

Fourth, mathematically v is the volume of a hypersphere. How cool is that? It’s like it’s from an X-men comic!

I also have a weakness for v because calculating it required R, which I had never used before I wanted to be able to calculate v, and so v was the reason I started using R. When re-reading the paper by Clintin & Jason, I felt the graphs they present (for studies estimating 3 to 18 parameters, and sample sizes from 0 to 600) did not directly correspond to my typical studies. So, it being the 1.5 year anniversary of R and me, I thought I’d plot v as a function of R-squared for some more typical numbers of parameters  (2, 3, 4, and 6), effect sizes (R-squared of 0.01 - 0.25), and sample sizes in psychology (30-300).






A quick R-squared to R conversion table for those who need it, and remember Cohen’s guidelines suggest an R = .1 is small, R = .3 = medium, and R = .5 is large.  

R-squared       0.05     0.10     0.15     0.20     0.25
R                    0.22     0.32     0.39     0.44     0.50

As we see, v depends on the sample size, number of parameters, and the effect size. For 2, 3, and 4 parameters, the effect sizes at which v > .5 doesn’t change substantially, but with more parameters being estimated (e.g., > 6) accuracy decreases substantially, which means you need substantially larger samples. For example, when estimating 2 parameters, a sample size of 50 requires an effect size larger than R-squared = 0.115 (R = .34) to have a v >.5.

When planning sample sizes, the v-stat can be one criterion you can use to decide which sample size you will plan for. You can also use v to evaluate the accuracy in published studies (see Lakens & Evers, 2014 for two examples). The R script to create these curves for different numbers of parameters, sample sizes, and effect sizes is available below.


No comments:

Post a Comment