Small samples and validity of scientific research

#1
I often hear semi-informed people dismissing scientific studies that they don't like on the basis that the study used a "small" sample size. Rarely do they define a cutoff for "small", or articulate a rationale for dismissing studies with "small" samples.

On the rare occasions where I have heard such people try to articulate a reasoning behind their concern over small-n studies, it has roughly been that they had a misconception that small n's are more likely to produce "flukes", i.e., Type I errors.

In reality the type I error rate is set by the researcher as alpha, not determined by n, so I don't really give this concern any credibility.

There IS a legitimate concern I'm aware of with SOME statistical tests which have a formal assumption about the normality of the parent population. Often those tests are said to be robust against violations of that assumption but ONLY for moderately large n (usually some rule of thumb like 30 or more).

But is there anything I'm missing here? Any other angle from which to criticize small-n studies?

Of course there are all kinds of criticisms you can make about NHST and relying on p-values, but if we're taking NHST and the "p < alpha" criterion as givens for the moment, what criticisms are there of small-n studies, ONCE they have already rejected H0 (thus, criticizing them as "under-powered" is kind of irrelevant) and been published?
 
#2
Found a pretty good discussion of this topic: Significant p-values in small samples



By the way, this video was the impetus for this thread:

[YOUTUBE]0Rnq1NpHdmw[/YOUTUBE]

(Quite a good video overall, despite my doubts about his occasional "but that study had a small n!" comments. He's certainly right about the lack of replication incentive and the problems with prominently publicly reporting exploratory results)
 

noetsi

Fortran must die
#3
I often hear semi-informed people dismissing scientific studies that they don't like on the basis that the study used a "small" sample size. Rarely do they define a cutoff for "small", or articulate a rationale for dismissing studies with "small" samples.
I am likely semi-informed myself, but I think you are missing the concern here. It is not type 1 error, internal validity, but the ability to generalize from your sample to a larger population that is the issue. Also there is the issue of being asymptotically accurate when assumptions such as normality are violated as they are often. Many doubt that sampling say a 1,000 people can reasonably tell you what a country of 300,000,000 believe :p And there are significant issues with samples in terms of generalization at least when people are concerned. These include psychometric error, people changing their mind, and in the era of cell phones actually getting a non-biased sample. Change over time is a major problem as is uncertainty on the part of those who respond and non-response rates. Missing data has become a major concern of data analyst as has non-response rates.

When you get 30% percent of those you sample to respond that is a major problem.

I have spent years studying the problem of a small, large etc sample because many statistical results are only accurate asymptotically. Its not just poorly informed people who have trouble defining this number or who give it no concrete level. It is common in the statistical literature, because at what point a sample becomes large for this purpose is not simple to determine. It depends on a variety of complex issues. I have lost track of the number of times I have read "this is not a major issue with a large sample" without a number or a range being put to that.
 
Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
#4
Two comments:


This can be linked also to the sampling technique. So was an actual random sample taken. And if so, was it large enough or can the technique used to analyze, handle your data or the chance that a leverage/outlier was selected in the sample (if sample was small).


Second, remember with frequentist based approaches the standard errors accounts for sample size. So small samples have larger SEs, making finding significance more difficult if effect size is not very large.
 

noetsi

Fortran must die
#5
It will always make finding significance more difficult, but with a large effect size this won't matter.

Sampling and generalization are essentially the same issue (external validity).