# Multiple random sampling. What to report on paper ?

#### og123

##### New Member
Hi,
I used t-test to compare the means of two groups, one was a test group (n=400) and the other was randomly selected (n=400) to used as control. The variable measured was numeric but the distribution was unknown.
Hence, I was told to do multiple sampling instead. That is, randomly select group of the same size as the test group (n=400) and calculate its mean, then, repeat this step 1,000 times to form a baseline (of 1,000 means).
Next, I was told to use this baseline to challenge the earlier test group. If the group is in, let's say, the third place within the baseline, than the p value is 0.003.

My question is, how do I report this process in my paper ? What is the statistical test done here ?
Other relevant comments are welcomed as well.
Thanks

#### hlsmith

##### Not a robit
This feels a little off. Why just do this with the control group? Maybe look at the permutation test or elucidated why the ttest wouldn't work.

#### og123

##### New Member
The ttest worked but the p-values were "too" significant for the referees taste... :|
I was told to use randomization in almost all my figures. In some figures, the groups were large (i.e n=25,000) and though the ttest was very significant (sometimes with p < 1X10-10 etc...) the difference in means was many times too small. So the refs' argued that it is not convincing and the p-value is the result of a large sample size. Then, they gave the same remark on other figure with smaller sample size like the above example (n=400) and even (n=210). I guess in those cases they were mostly angry at the fact that ttest is for normal distribution.

I don't know why do the resampling only for the control. The way I understand it, is that once you know how random groups are distributed, you can know the probability of getting the mean you got in your test group. Though, I don't understand why not use ttest if we know that all means are eventually normally distributed according to the Central limit theorem.

I think I will just call it "randomization test" in my paper and end it :| No?

#### hlsmith

##### Not a robit
Well I will agree with the referees that p-values are a faux pas this day and age and can be misleading. I understand what they were trying to say, but I was just unsure if you can call that a pvalue (I am more applied than theoretical). I would check out the permutation test, which gets at the same thing. Also, why randomly sample 400 when you can use all data? Does your variable have a tendency to be skewed or results violate ttest assumptions (those things can be investigated).

Also, you may want to report say Cohen's D for your results instead of pvalues - that may appease them.

#### og123

##### New Member
Thanks hlsmith
My variable changes between figures. It always numeric though. I don't think its skewed and I think it actually normaly distributed.
Sometimes its size (in real numbers 1 2 3 ..) sometimes its a score like (0.2 or 88.47)...
For the small samples (n = 400 or less) I actually have the entire population. For the larger cases, I sampled bases (in the human genome).
I don't see papers report z score often, I think in my field (bioinformatics/genetics) it is common to report case and control.
**So (this is another point actually), I plotted the case sample and one of the random sample in my figures using boxplot. And I wonder if that's a good way to go to present results.
**In one figure though, I plotted the distribution of the means of the resampling and than pointed out where the case group mean "landed" on the bell, I actually like this figure a lot
I read wiki for permutation test and I saw it can be called randomization test. I am not familiar with Cohen's D, I will read some info about it.

Last edited: