Statistical comparisons for large sample sizes (n>1000)

I am comparing the drug exposures across two different groups, consisting of 1000 simulated drug exposures per group. Drug exposures are continuous variables following a normal distribution.
I want to know if different doses yield a statistically significant difference in mean drug exposure across the two groups. I am observing that even if I calibrate "artificially" the doses to generate very similar mean exposures in both groups, all the statistical tests will always return very low p-values despite the very low difference in the groups' means. I guess this is due to the very large sample size (n = 1000 per group).
However, if I reduce the sample size (to 50 virtual drug exposures, let's say) the exposure is very sensitive to the sampling procedure because the samples are taken from a distribution with high standard deviation compared to the mean, and repeating the same analysis on different datasets can give very different means in exposure.
Is this a case where I should focus more on the "biological relevance" of the difference rather than the significance of such difference? Can you suggest a different approach to judging the relevance of the difference based on robust criteria?


Active Member
Hi Javier,

You are in the correct direction.
We should never look only at the p-value.

I want to know if different doses yield a statistically significant difference in mean drug exposure across the two groups
The significance result is only part of the equation.
This is not what you want to know :)

I don't know your area of research, but probably different dose has different drug exposures that asymptote to a constant.
So increasing the doses will increase the "drug exposure" but the difference will be smaller and smaller, limit to zero.
So at least until specific dozes, the question is not if the difference is significantly different, because it is different, and with a large enough sample size you will find it ...

The solution is not to take a smaller sample size ...
Usually, if a larger sample size cost more, you will choose the smaller sample size that will be able to identify the required effect size.
But if you already have large sample size you should use it.

The question is what is the "effect size" and is the effect is significant.
Please look for the standardized - Cohen effect size d=(avg(x1)-avg(x2))/S
Cohen define also descriptions of the values of the effect ("very small", "small", ..., "huge"), but I assume it is also related to the area of research.

You may as well decide yourself what non-standardized effect size (avg(x1)-avg(x2)) is meaningless, even if the result is a significant result.

You may look at the following example for the balance between p-value and effect size