The 20% Statistician

Thursday, December 4, 2014

P-curves are better at effect size estimation than trim-and-fill (and Michael Jordan is better at free throws than I am)

I like p-curve analyses. They are a great tool to evaluate whether sets of studies have evidential value or not. In a recent paper, Simonsohn, Nelson, & Simmons (2014) show how p-curve analyses can correct for publication bias using only significant results. That’s pretty cool. However, trim-and-fill is notoriously bad at unbiased effect size estimation. I think what we really need is a comparison between state-of-the-art methods to correct effect size estimates for publication bias. I dip a toe in the water at the end of this post, but can't wait until someone does a formal comparison between p-curve and techniques like p-uniform (a technique developed independently by Marcel van Assen and colleagues) and meta-regression techniques like PET-PEESE analyses by Stanley and Doucouliagos.

If you wanted to know how good a basketball player Michael Jordan is, you wouldn't compare how much better he is at free throws than I am. The fact that p-curve outperforms trim-an-fill methods in a simulation where studies are not published purely based on their p-value is only interesting for historic reasons (trim-and-fill is well-known, even though it is now considered out-dated), but not interesting when you want to compare state-of-the-art techniques to correct meta-analytic effect sizes for publication bias.

Trim-and-fill is not created to examine publication bias caused by effects that yield non-significant p-values. Instead, it's created for publication bias caused by effects that are strongly (perhaps even significantly!) in the opposite direction of the remaining studies or the effect a researcher wants to find. Think of a meta-analysis on the benefits of some treatment, where the 8 studies that revealed the treatment actually makes people feel much worse are hidden in the file drawer. In the picture below (source) we can see how trim-and-fill assumes there were two missing studies in a meta-analysis (of which one was significant in the opposite direction, because the diagonal lines correspond to 95% confidence intervals).

How does trim-and-fill work? The procedure starts by removing (‘trimming’) small studies that bias the meta-analytic effect size, then estimates the true effect size, and ends with ‘filling’ in a funnel plot with studies that are assumed to be missing due to publication bias.

However, it is already known that trim-and-fill is not very good under many realistic publication bias scenarios (such as when publication bias is caused based on the height of the p-value of the effect, which is arguable the most common source of publication bias in psychology). The method is criticized for its reliance on the strong assumption of symmetry in the funnel plot, and when publication bias is induced by a p-value boundary (such as in the simulation by Simonsohn et al), the trim-and-fill method does not perform well enough to yield a corrected meta-analytic effect size estimate that is close to the true effect size (Peters, Sutton, Jones, Abrams, & Rushton, 2007; Terrin, Schmid, Lau, & Olkin, 2003). When the assumptions are met, it can be used as a sensitivity analysis (with little difference between the corrected and uncorrected effect size estimate indicating publication bias is unlikely to change the conclusions of the meta-analysis), but even then, if there are differences between the corrected and uncorrected effect size estimate, researchers should not report the corrected effect size estimate as an estimate of the true effect size (Peters et al., 2007).

I’ve adapted the simulation by Simonsohn et al (2014) to look at only one meta-analysis at a time, and re-run one analysis underlying Figure 2a. It allows me to visualize a single simulation, and below is a nice demonstration of why it is important to visualize your data. We see the simulated data is quite different from what you typically see in a meta-analysis. Note that the true effect size in this simulation is 0, but power is so low, and publication bias is so extreme, the meta-analytic effect size estimate from a normal meta-analysis is very high (d = 0.77 instead of the true d = 0). It’s excellent p-curve analysis can accurately estimate effect sizes in such a situation, even though I don't hope anyone will ever perform a meta-analysis on studies that look like the figure below. As I said, I totally believe Michael Jordan is going to win at free-throws from me (especially if he can stand half as far from the basketball ring).

Let’s increase the variability a little by adding more studies of different sizes (ranging from N=40 to N=400). The great R script by Simonsohn et al makes this very easy. We see two things in the picture below. First of all, re-running the simulation, while allowing for sample sizes of up to 400 reduces the effect size overestimation. Although it is no-where near 0, this demonstrates how running only hugely underpowered studies can immensely increase publication bias. It also demonstrates that running larger studies will in itself not fix science - we also need to publically share all well-designed studies to prevent publication bias.

The second thing we see is that it start to look a little bit (albeit a very little bit) more like a normal distribution of effect sizes from mainly underpowered studies that suffer from publication bias (yes, that's a very specific definition of 'normal', I know). Where trim-and-fill believed 0 studies were missing in the simulation above, it thinks a decent number of studies (around 20%) are missing below (not nearly enough) and it even adjusts the meta-analytic effect size estimate a little (although again not nearly enough, which is to be expected, given that it’s assumptions are violated in this simulation).

How would p-curve compare to more state-of-the-art meta-analytic techniques that correct for publication bias? My favorite (with the limited knowledge I have of this topic at the moment) is PET-PEESE meta-regression. For an excellent introduction through application, including R code, see this paper by Carter & McCullough.

I ran a PET (precision-effect-test) in both situations above (see R script below). PET really didn’t know what to do in the first example above, probably because regression based approaches need more variability in the standard error (so the presence of larger and smaller studies). However, in the test on the second simulation, PET performed very well, estimating the true meta-analytic effect size corrected for publication bias did not significantly differ from 0. So, under the right conditions, both p-curve and PET-PEESE meta-regression can point out publication bias and provide an unbiased effect size estimate of 0, when the true effect size is indeed 0.

So how would Michael Jordan do compared to Magic Johnson? And how would p-curve do against p-uniform or PET-PEESE meta-regression? I’d love to see a comparison, especially focusing on which technique will work best under which assumptions. For example, does p-curve work better when there are only small studies, but does PET-PEESE do better when publication bias is less extreme? Publication bias is the biggest problem a quantitative science can have, and it’s clear that with so many improved techniques to calculate unbiased effect sizes estimates, we have important tools to draw better inferences from the scientific literature. Let's see how to combine these tools to make the best inferences under which circumstances. If there's a rogue statistician out there who wants to help us researchers select the best tool for the job, a detailed comparison of these different recent approaches to correct meta-analytic effect size estimates for publication bias would be an extremely useful project for those cold winter days ahead.

References

Carter, E. C., & McCullough, M. E. (2014). Publication bias and the limited strength model of self-control: has the evidence for ego depletion been overestimated? Frontiers in Psychology, 5. doi: 10.3389/fpsyg.2014.00823

Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., & Rushton, L. (2007). Performance of the trim and fill method in the presence of publication bias and between‐study heterogeneity. Statistics in Medicine, 26(25), 4544-4562.

Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534-547. doi: 10.1037/a0033242

Stanley, T. D & Doucouliagos, Hristos. (2014). Meta-regression approximations to reduce publication selection bias. Research Synthesis Methods, 5, 60-78. doi:10.1002/jrsm.1095

Terrin, N., Schmid, C. H., Lau, J., & Olkin, I. (2003). Adjusting for publication bias in the presence of heterogeneity. Statistics in Medicine, 22(13), 2113-2126.

Tuesday, December 2, 2014

98% Capture Percentages and 99.9% Confidence Intervals

I think it is important people report confidence intervals to provide an indication of the uncertainty in the point estimates they report. However, I am not too enthusiastic about the current practice to report 95% confidence intervals. I think there are good reasons to consider alternatives, such as reporting 99.9% confidence intervals instead.

I’m not alone in my dislike of a 95% CI. Sterne and Smith (2001, p. 230) have provided the following recommendation for the use of confidence intervals:

Confidence intervals for the main results should always be included, but 90% rather than 95% levels should be used. Confidence intervals should not be used as a surrogate means of examining significance at the conventional 5% level. Interpretation of confidence intervals should focus on the implications (clinical importance) of the range of values in the interval.

This last sentence is, along with world peace, an excellent recommendation of what we should focus on. Neither seems very likely in the near future. I think people (especially when they just do theoretical research) will continue to interpret confidence intervals as an indication of whether an effect is statistically different from 0, and make even more dichotomous statements than they do with p-values. After all, a confidence intervals includes 0 or not, but p-values come in three* different** magnitudes***.

We know that relatively high p-values (e.g., p-values > 0.01) provide relatively weak support for H1 (or sometimes, in large samples which give high power, they actually provide support for H0). So instead of using a 90% CI, I think it’s a better idea to use a 99.9% CI. This has several benefits.

First of all, instead of arguing to stop reporting p-values (which I don’t think is necessary) because confidence intervals give us exactly the same information, we can report p-values as we are used to (using p < 0.05) and 99.9% CI that tell us whether an effect differs from 0 with p < .001. We can now immediately see whether an effect is statistically different from 0 using the typical alpha-value, and whether it is still statistically different from 0 if we would have used a much stricter alpha level of 0.001. Note that I have argued against using a p<.001 as a hard criterion to judge whether a scientific finding has evidential value, and prefer a more continuous evaluation of research findings. However, when using 99.9% CI, you can use the traditional significance criterion we are used to, while at the same time looking what would have happened had you followed stricter recommendations of p < .001. Since evidential value is negatively correlated with p-values (the lower the p-value, the higher the evidential value, all else being equal, see Good, 1992), any effect that would have been significant with a p < .001 has more evidential value than an effect only significant at p < .05.

Second, confidence intervals are often incorrectly interpreted as the range which will contain the true parameter of interest (such as an effect size). Confidence intervals are statements about the number of future confidence intervals that will include the true parameter, not a statement about the number of parameters that will fall within the confidence interval.

The more intuitive interpretation people want to use when they see a confidence interval is to interpret it as a Capture Percentage (CP). My back-of-an-envelope explanation in the picture below shows how a 95% CI is only a 95% CP when the parameter (such as an effect size) you observe in a single sample happens to be exactly the same as the true parameter (left). When this is not the case (and it is almost often not exactly the case) less than 95% of future effect sizes will fall within the CI from your current sample (see the right side of the figure). In the long run, so on average, a 95% CI has a 83.4% capture probability.

This is explained in Cumming & Maillardet (2006), who present two formula to convert a Confidence Interval to a Capture Percentage. I’ve made a spreadsheet in case you want to try out different values:

https://onedrive.live.com/edit.aspx?cid=DF3F7227F3844BE2&resid=DF3F7227F3844BE2!93409&app=Excel

Capture probabilities are interesting. You typically have only a single sample, and thus a single confidence interval, so any statement about what an infinity of future confidence intervals will do is relatively uninteresting. However, a capture percentage is a statement you can make based on a single confidence interval. Based on a single interval, it will say something about where future statistics (such as means or effect sizes) are likely to fall. A value of 83.4% is a little low (it means on average 16.6% of the time you will be wrong in the future). For a 99.9% confidence interval, the capture percentage is 98%. That’s two easy to remember numbers, and being 98% certain of where you can expect something is pretty good.

So, changing reporting practices away from 95% confidence intervals to 99.9% confidence intervals and 98% capture intervals has at least two benefits. The only downside is that confidence intervals are a little wider (see below for an independent t-test with n = 20 and a true d of 1), but if you really care about the width of a confidence interval, you can always collect a larger sample. Does this make sense? I'd love to hear your ideas about using 99.9% confidence intervals in the comments.

Tuesday, November 18, 2014

Evaluating Estimation Accuracy With the Award Winning V-Statistic

Clintin Davis-Stober recently pointed me to his blog about the v-statistic. The v-statistic (nothing Greek, just the letter v) is a measure of how accurately your data is estimating the corresponding parameter values in the population. The paper he wrote with Jason Dana won the Clifford T. Morgan award (awarded to the best paper in each Psychonomic Society journal, this being from Behavior Research Methods). This makes v an award winning new statistic. I’m really happy this paper won an award for a number of reasons.

First of all, the v-statistic is not based on Frequentist of Bayesian statistics. It introduces a third perspective on accuracy. This is great, because I greatly dislike any type of polarized discussion, and especially the one between Frequentists and Bayesians. With a new kid on the block, perhaps people will start to acknowledge the value of multiple perspectives on statistical inferences.

Second, v is determined by the number of parameters you examine (p), the effect size R-squared (Rsq), and the sample size. To increase accuracy, you need to increase the sample size. But where other approaches, such as those based on the width of a confidence interval, lack a clear minimal value researchers should aim for, the v-statistics has a clear lower boundary to beat: 50% guessing average. You want a v>.5. It’s great to say people should think for themselves, and not blindly use numbers (significance levels of 0.05, 80% power, medium effect sizes of .5, Bayes Factors > 10) but let’s be honest: That’s not what the majority of researchers want. And whereas under certain circumstances the use of a p = .05 is rather silly, you can’t go wrong with using v > .5 as a minimum. Everyone is happy.

Third, Ellen Evers and I wrote about the v-statistic in our 2014 paper on improving the informational value of studies (Lakens & Evers, 2014), way before v won an award. It’s like discovering a really great band before it becomes popular.

Fourth, mathematically v is the volume of a hypersphere. How cool is that? It’s like it’s from an X-men comic!

I also have a weakness for v because calculating it required R, which I had never used before I wanted to be able to calculate v, and so v was the reason I started using R. When re-reading the paper by Clintin & Jason, I felt the graphs they present (for studies estimating 3 to 18 parameters, and sample sizes from 0 to 600) did not directly correspond to my typical studies. So, it being the 1.5 year anniversary of R and me, I thought I’d plot v as a function of R-squared for some more typical numbers of parameters (2, 3, 4, and 6), effect sizes (R-squared of 0.01 - 0.25), and sample sizes in psychology (30-300).

A quick R-squared to R conversion table for those who need it, and remember Cohen’s guidelines suggest an R = .1 is small, R = .3 = medium, and R = .5 is large.

R-squared 0.05 0.10 0.15 0.20 0.25

R 0.22 0.32 0.39 0.44 0.50

As we see, v depends on the sample size, number of parameters, and the effect size. For 2, 3, and 4 parameters, the effect sizes at which v > .5 doesn’t change substantially, but with more parameters being estimated (e.g., > 6) accuracy decreases substantially, which means you need substantially larger samples. For example, when estimating 2 parameters, a sample size of 50 requires an effect size larger than R-squared = 0.115 (R = .34) to have a v >.5.

When planning sample sizes, the v-stat can be one criterion you can use to decide which sample size you will plan for. You can also use v to evaluate the accuracy in published studies (see Lakens & Evers, 2014 for two examples). The R script to create these curves for different numbers of parameters, sample sizes, and effect sizes is available below.

Tuesday, November 4, 2014

Negotiations between Elsevier and Dutch Universities break down: Time for change

Excellent news this morning: The negotiations between Dutch universities and Elsevier about new contracts for access to the scientific literature have broken down (report in Dutch by VSNU and De Volkskrant). This means universities are finally taking a stand: We want scientific knowledge to be freely available as open access to anyone who wants to read it, and we believe publishers are making more than enough money as it is. Moving towards fully open access should not cost Dutch tax payers anything extra.

There might be some small inconveniences along the way for us scientists. Just as we needed to mail authors of articles in the 70's and 80's, we might need to start e-mailing authors for their papers in 2015. Researchers who don't know what a #Icanhazpdf hashtag is might need some Twitter education. Sometimes, you might not get an article as quickly as you were used to. We might experience the consequences of not having access to literature ourselves.

I think that is good. One of my favorite quotes from Zen and the Art of Motorcycle Maintenance reads as follows: "Stuckness shouldn't be avoided. It's the psychic predecessor of all real understanding." It's excellent that the Dutch universities and Elsevier are stuck. If this stuckness lasts, it's excellent that a researcher gets stuck when searching the literature. We need this to understand how untenable the status quo is.

There are very cheap and easy solutions. I've talked to some people very close to these negotiations, and there is a plan B. There is even a more interesting plan C, where the Dutch universities examine the possibility of making large deals with Plos One and The Peer J to pay a lump sum to provide all researchers opportunity to publish in these journals, paid by the government. If we would compare that to what we currently pay Elsevier alone, that would probably be a sweet deal.

The major thing standing in our way is our own ego. Elsevier has journals with a 'reputation', and many researchers are willing to screw over Dutch tax payers just to be able to publish in a journal like Nature or Science. Their work will be just as good when it is published in Peer J. However, these researchers are not making a rational choice, but an emotional choice, driven by self-interest. Short term self-interest will always trump long term public benefit. You can imagine the wide grins on Elsevier's faces whenever this topic comes up. They know scientists are often little ego-factories, and they are more than happy to cash in.

That's why I think it's good that we are stuck. Someone needs to put their foot down. Open Access publishing is the light bulb of the scientific world. Regulations were needed to make sure the public would switch from the low priced but energy inefficient light bulb to the energy saving light bulbs that were slightly more expensive to buy, but better for everyone in the long term. Researchers similarly need a push to move away from publishing options that thrive on short term benefit, but screw Dutch tax payers in the long term. I hope we stay stuck until we all reach some real understanding.

Thursday, October 30, 2014

Sample Size Planning: P-values or Precision?

Yesterday Mike McCullough posted an interesting question on Twitter. He had collected some data, observed a p = 0.026 for his hypothesis, but he wasn't happy. Being aware that higher p-values do not always provide strong support for H1, he wanted to set a new goal and collect more data. With sequential analyses it's no problem to look at data (at least when you plan them ahead of time), and collect additional observations (because you control the false positive rate) so the question was: which goal should you have?

Mike suggested a p-value of 0.01, which (especially with increasing sample sizes) is a good target. But others quickly suggested forgetting about those damned p-values altogether, and plan for accuracy. Planning for accuracy is simple: you decide upon the width of the confidence interval you'd like, and determine the sample size you need.

I don't really understand why people are pretending like these two choices are any different. They always boil down to the same thing: your sample size. A higher sample size will give you more power, and thus a better chance of observing a p-value of 0.01, or 0.001. A higher sample will also reduce the width of your confidence interval.

So the only difference is which calculation you use to base your sample size on. You either decide upon an effect size you expect, or a width of a confidence you desire, and calculate the sample size. One criticism against power analysis is that you often don't know the effect size (e.g., Maxwell, Kelley, & Rausch, 2008). But with sequential analyses (e.g., Lakens, 2014) you can simple collect some data, and calculate conditional power based on the observed effect size for the remaining sample.

I think a bigger problem is that people have no clue whatsoever when determining an appropriate width for a confidence interval. I've argued before that people have a much better feel for p-values than confidence intervals.

In the graph below, you see 60 one-sided t-test, all examining a true effect with a mean difference of 0.3 (dark vertical line) with a SD of 1. The bottom 20 are based on a sample size of 118, the middle 20 on a sample size of 167, and the top 20 on a sample size of 238. This gives you 90% power for a p=0.05, p=0.01, and p=0.001, respectively. Not surprisingly, as power increases, less confidence intervals include 0 (i.e., are significant). The higher the sample size, the further the confidence intervals stay away from 0.

Take a look at the width of the confidence intervals. Can you see the differences? Do you feel the difference in aiming for a width of the confidence interval of 0.40, 0.30, or 0.25 (more or less the width in the three groups from bottom to top)? If not, but you do have feel for the difference between aiming for p = 0.01 or p=0.001, then go for the conditional power analysis. I would.

R script that produced the figure above:

Friday, October 10, 2014

Why Do We Cite Small N Studies?

Chris Fraley and Simine Vizare (F&V) published a very interesting paper in PlosOne where they propose to evaluate journals based on the sample size and statistical power of the studies. As the authors reason: “All else being equal, we believe that journals that publish empirical studies based on highly powered designs should be regarded as more prestigious and credible scientific outlets than those that do not.” What they find is “the journals that have the highest impact also tend to publish studies that have smaller samples”. How can this be? Do we simply not care about the informational value of studies, or even prefer to cite smaller studies?

What is ‘impact’?

‘Impact Factor’ should join ‘significance’ in the graveyard of misleading concepts in science. For an excellent blog post about some of the problems with the impact factor, go here. We intuitively feel high impact factor journals (see how I am not using ‘high impact journals’, just as I prefer ‘a statistical difference’ over ‘statistical significance’?) should publish high quality research, but citation rates are extremely skewed. For example, the paper by Simmons, Nelson, & Simonsohn (2011) illustrating problems with small samples and false positives was cited more than 200 times within the first 2 years, and has greatly contributed to the impact factor of Psychological Science (it’s ok if you find that ironic).

The relation between the median sample size (used by F&V) and impact factor is one approach to examine whether number of citations and sample size are related, but we should probably be especially interested in the small number of studies in high impact factor journals that contribute most to the impact factor. At least some of these are probably not even empirical papers (and please don’t start citing Cumming, 2014, in Psychological Science, whenever you want to refer to “The New Statistics” – it just shows you were too lazy to read the book; You should cite Cumming, 2012). Even so, F&V would probably note that there are simply too many articles with tiny sample sizes getting to many citations, and I’d agree.

There are several reasons for this, but all of them are caused by you and me, because we are the ones doing the citing. We don’t always (or we often don’t?) cite articles because of their quality (again, see this blog). Let me add one. As we discuss (Koole & Lakens, 2012) psychological science has a strong narrative tradition. We like to present our research as a story, instead of as a bunch of dry facts. This culture has many consequences (such as an under appreciation of telling the same story twice by publishing replications, and a tendency to only tell the post-hoc final edited version of the story, and not the one you initially had in mind [see Bem, 1987]) but it also means we highly reward the first person to come up with a story – even though their data wasn’t particularly strong.

F&V’s main point, I think, is not that we should have expected sample size and impact factors to be correlated, but more normative: We should want impact factors and sample size to be related. Their argument for a cultural shift towards a greater appreciation of sample size as an indicator of the quality of a study is important, and makes sense, rationally. Although I don’t think people will easily give up their narrative tradition, the new generation of reviewers with highly improved statistical knowledge are no longer convinced by an excellent story arc, but want to see empirical support for your theoretical rationale. When you write ‘We know that X leads to more Y (Someone, Sometime), and therefore predict….” you can still reference someone who happened to have published about the topic slightly earlier than someone else. I’m not asking you to give up your culture. But if that first study had a sample size of 20 per condition without examining an effect that should clearly be huge (d>1), know that you are expected to add a reference to a study that provides convincing empirical support for the narrative (showing the same basic idea in a larger sample), or reviewers will not be convinced.

Is N all important?

Fraley and Vazire (2014) only code sample size, not the type of design, or the number of conditions. At the same time, we know Psychological Science likes elegant designs (which might very well mean simple comparisons between two conditions, and not a 2x2x3 design examining the impact of some moderator). This might explain why sample sizes are smaller in Psychological Science. This also has a consequence for the power calculations by F&V, which are similarly based on the assumption journals do not differ in the type of designs they publish. But if the Journal of Personality Research (a journal which F&V show has larger samples, but a lower impact factor) publishes a lot more correlational or between subject studies than Psychological Science, that could matter quite a bit.

This doesn’t mean Psychological Science is off the hook. Table 4 in F&V illustrates that the median sample size only gives sufficient power to observe large effects, and it is unrealistic all studies published in Psychological Science have large effects. This is not very surprising (we immediately realize why the paper by F&V was not published in Psychological Science, wink). However, low sample sizes are especially problematic for journals like Psychological Science, whose editors say “We hope to publish manuscripts that are innovative and ground-breaking and that address issues likely to interest a wide range of scientists in the field.” There are different types of innovative, but one is where everyone (researchers themselves and readers) consider a finding ‘surprising’ or ‘counterintuitive’. If a journal published findings that are a-priori unlikely (so less than 50% probable, however subjective this might be) collecting a large sample becomes even more important if you’d like H1 to have a high posterior probability in a Bayesian sense. F&V present good arguments to have large samples using Frequentist assumptions – which similarly become more important when examining a-priori unlikely hypotheses.

The solution is to run larger samples (not necessarily by running experiments with 200 people as Simine Vazire suggests on her blog, but for example by using sequential analyses) to increase power, and to perform close replications (which reduce Type 1 errors).

A good start

The N-pact factor might be a good starting point for people to use when deciding what to cite. Remember that the sample size is just a proxy for power (small studies can have high power, if there is good reason to believe the effect size is very large) and power is only one dimension you can use evaluate studies (you can also look at the a-priori likelihood, the effect size, etc.). Nevertheless, research tells us that reviewers only moderately agree on the quality of a scientific article (and people are often biased in their quality judgments based on the impact factor of the journal a paper was published in), so it seems that at least for now, asking people to use sample size as a proxy of the informational value of studies is a good start. In a few years, we should hope the impact factor and N-Pact factor have become at least somewhat positively correlated – preferably because high impact journals start to publish more studies with large sample sizes, and because people start to reward individuals who took the effort to contribute studies to the scientific literature with a higher informational by collecting larger samples by citing their work more.

Postscript

In my hometown, there are two art fairs. The traditional one sells hugely overpriced pieces of art by established artists who are ‘hot’ as determined by the majority of art collectors. The other one, the Raw Art Fair, showcases the work of artists that don’t yet have a lot of impact. Many never will, but for me, the raw art fair is always more memorable, because it makes you think about what you are seeing, and forces you to judge the quality based on your own criteria. For exactly the same reason I prefer to read papers on SSRN, PlosOne, or Frontiers.