# Simple question about individual values being sig different from population mean.

#### sammyj

##### New Member
Hi

I was wondering if someone could help.

Say I have a sample of numbers 5,6,7,5,6,7,5,6,5,4 and I want to test which of these individual numbers is significantly different from my population mean of 5, how do I go about doing it?

I've carried out a t-test and found that the sample mean of 5.6 is not significantly different from my population mean of 5, at the alpha=5% level. My 95% confidence interval is [4.91, 6.29].

Can I then assume that, at 5% significance level, that the values 4 and 7 in my original sample of numbers are significantly different from my population mean of 5 as they are outside my confidence interval, even though the overall sample mean is not significantly different from the population mean?

If I am wrong, can anyone please advise which method I should be using to identify which individual values are significantly different from my population mean?

Thanks
SJ

#### noetsi

##### No cake for spunky
Your sample size is too small to get meaningful responses. A one sample t-test (tested against 5 as the assumed population mean) will work if your sample is larger.

Your t-test significance test is doubtful given the small sample size. The difference might well be signficant, but you can not find this because your power is so low.

You need to gather more data. There is no valid statistical test for 10 cases.

#### Dason

There is no valid statistical test for 10 cases.
This isn't true.

But the main issue is that you don't really ever say that a certain value in a data set is statistically different from the population mean. What is your goal here (without resorting to statistics what are you trying to do?). There are what is known as tolerance limits which may be what you're interested in. They're used mostly in manufacturing processes as far as I'm aware but they can be applied in other places which is why I was interested in what your actual goal is.

#### sammyj

##### New Member
noetsi and Dosen,

Noetsi - The small sample size of 10 above wasn't suppose to be taken literally - it was just a very simple example put together to illsutrate the problem/concept that I was trying to explain. My actual problem may involve hundreds/thousands of values.

Dosen - My main objective is to calculate the false-positive rate. I am actually generating data from a normal distribution (having specified the mean and standard deviation) and then I want to count the number of values (which have been generated) that differ significantly from the specified mean, so that I can calculate the false-positive rate.

Thanks.

#### noetsi

##### No cake for spunky
Danson which statistical test has enough power with ten cases to work reliably? None that I have ever heard off? Please don't say bootstrapping. You don't end up with a true distribution that way you create an artificial one. And many analysis of bootstrapping raise signficant doubts over its validity.

#### Dason

I was just pointing out that you were making a sweeping generalization that isn't necessarily true. If the assumptions of your test are true then sample size doesn't necessarily play a role in the validity of the test. Heck there are valid statistical test when you only have a single observation depending on the assumptions made. In practice it can be hard to justify some assumptions with very few data points but that doesn't mean that the test itself is invalid.

Edit: Also - it's Dason. Not Danson.

Last edited:

#### noetsi

##### No cake for spunky
I would think, regardless of the theoretical validity of the test, that with power as low as you would have with 10 cases you would never be able to run a meaningful statistical test. It is not about the assumption being right, its about the limitation of capturing statistically signficant results with such few cases with real world software.

Even test with very simple assumptions like Fisher's exact test won't function (at least in my experience) when you have so few cases (I tried recently running a whole battery of tests with a small number of cases - nothing worked). With one case you would have no variation; I don't understand how a statistical test can function under that limitation.

But I am probably wrong

#### Jake

Sammyj, I am not sure what you mean by calculating the false positive rate for data generated from a normal distribution. It sounds to me like you are just talking about the area under the curve of the tails of a normal distribution. Which doesn't require simulating data at all. Am I missing something?

#### Jake

Regarding small sample inference: N is only one of the several terms that define the width of a confidence interval and therefore determine the power of a model comparison. It seems odd to me to say that N is somehow more important than the other terms. That is plainly not the case as you can see from the confidence interval formula. All else being equal, lower values of N leads to less power, yes. But this effect can be easily counterweighed by the other terms such as variance of X, effect size, number of parameters in the two models, etc. Neuroimaging studies routinely use sample sizes of <10 and see robust results. If there is a problematic issue here it is with the claims concerning generalizability, not power necessarily.

#### noetsi

##### No cake for spunky
Perhaps this is simply the result of the type of analysis I do, but I don't understand the value of any analysis you can not generalize with. It is true that, for example, you can do qualitative analysis with one person and on the basis of such a study make definitive comments on that one individual and no one else. I don't see that as the purpose of analysis generally. The point is to be able to be sure your answer is correct of a larger population not just your sample.

The question is not if n is more or less important than other factors. It's whether you can reliably use the results of analysis that has ten or fewer cases. In most methods I don't believe you can; commonly the results in my experience are nonsensical or there are huge questions about the reasonableness of the results.

But again I run much more narrow types of analysis then others here. If I ran an analysis with ten people I would get laughed at by my managers

#### Jake

Okay. So to the extent that there is an issue, it is about generalization and not power. I agree.

#### Dason

But again I run much more narrow types of analysis then others here. If I ran an analysis with ten people I would get laughed at by my managers
That's fine. Just remember that there is other type of data out there. I just finished presenting some results this morning for an analysis where we had 12 experimental units - it was a 2x2 design so there were 3 EUs in each group.

Is this optimal? Nope - and we did have low power but it's the data we had.

#### sammyj

##### New Member
Thanks for all replies. Seems like it's created a bit of additional conversation between a few of yous.

Anyhow, Jake - let's maybe take a step back. I suppose the main thing I am trying to find out is:
If I was given 100 values and told that they happened to be from a normal distribution, and I was told what the population parameters were, how would I determine which of those 100 individual values were significantly different from the population mean?

Another possible way I thought of doing it was - I'd standardise each of the 100 values and then determine which ones fall in the critical region under the normal curve.

Any thoughts?

#### Jake

Another possible way I thought of doing it was - I'd standardise each of the 100 values and then determine which ones fall in the critical region under the normal curve.
Exactly. This is all you need to do. But I don't think it's appropriate to call it a "critical region" in this context. You're not really doing a hypothesis test on each data point. I mean, you know in advance that the expected number of observations that will fall in the most extreme x% tails of the distributions is, obviously, x% of the data. There's not really any inference that needs to be made.

#### noetsi

##### No cake for spunky
That's fine. Just remember that there is other type of data out there. I just finished presenting some results this morning for an analysis where we had 12 experimental units - it was a 2x2 design so there were 3 EUs in each group.

Is this optimal? Nope - and we did have low power but it's the data we had.
I have done that before. But I do my best to avoid it because I consider the results doubtful and wrong (or even unreliable) results to me are worse than none. If they are making a policy change or spending hundreds of millions of dollars based on your analysis and your analysis is wrong because your data is wrong that is a very very bad thing.