Any advice? Thanks!

- Thread starter shoes
- Start date

Any advice? Thanks!

I'm trying to determine the standard deviation of multiple sample sets .... Thanks!

Let me just ask the following for clarification of your problem. Are you suggesting that your scenario is this:

Let X={x1,x2,…,xN}, Y={y1,y2,…yN}, Z={z1,z2,…,zN} denote 3 data sets with known means and standard deviations (not necessarily with equal sample sizes).

Let A be the union of these data sets, i.e.

A ={x1,x2,…,xN,y1,y2,…yN,z1,z2,…,zN}.

Now, are you asking what is the mean and standard deviation of A when you don’t have the data but have only knowledge of the means and standard deviations of X, Y, and Z?...Is this scenario I describe correct?

Thank you!

Yes, you have it right.

Thank you!

Thank you!

mean=[n1 /(n1+n2)]*Xbar1 + [n2 /(n1+n2)]*Xbar2

variance=[ n1^2*Var1 + n2^2*Var2 – n1*Var1 – n1*Var2 – n2*Var1 -

n2*Var2 + n1*n2*Var1 + n1*n2*Var2 +n1*n2*(Xbar1 – Xbar2)^2 ] / [ (n1+n2-1)*(n1+n2) ]

I’ll show an example for the means so you can get the idea on how to do this. This idea is the same for the variance (standard deviation).

Example: Suppose I have 3 data sets with:

Xbar1=5; Std.dev1.=2; Var1=4; n1=10

Xbar2=15; Std.dev2=3; Var2=9;n2=15

Xbar3=8; Std.dev3=5; Var3=25; n3=20

Now to get the mean of the 3 data sets apply the first two sets of statistics

mean(1,2) = [10 /(10+15)]*5 + [15 /(10+15)]*15 =11

Now, use this result as follows:

mean(1,2,3) = [25 /(25+20)]*11 + [20 /(25+20)]*8 = 9.66666.

Now just apply this idea using the formula for variance above.

Obviously, in the end just take the sqrt of the variance to get the standard deviation for the merged (3) sets of data.

BTW, this idea is completely general for k sets of data.

Last edited:

Thanks for the formula. This weights each mean (and standard deviation) the number of measurements of each, which is not exactly intuitive to me. For example, if I have 2 people, and A is 4' tall with 3 measurements, while B is 7' tall with 27 measurements, the mean height is:

mean(A,B) = [3/(27+3)]*4 + [27/(30)]*7 = 6.7 feet.

Very odd indeed, since I'd expect the mean to be at least near 5.5', but I'll take your word for it - perhaps an indication that one really should have equal numbers of measurements.

Last edited:

One more thing. Is it possible to have a more general equation that could be used for more parameters? I have something like 10 such populations so applying your equation 9 times is little bit time consuming.

Best Regards

Alex.

Okay, here is the formulae you need. This will give you the (exact) mean and variance as if you actually had the data. What you need to do is merge the data sets one by one using the results on the subsequent data set.

mean=[n1 /(n1+n2)]*Xbar1 + [n2 /(n1+n2)]*Xbar2

variance=[ n1^2*Var1 + n2^2*Var2 – n1*Var1 – n1*Var2 – n2*Var1 -

n2*Var2 + n1*n2*Var1 + n1*n2*Var2 +n1*n2*(Xbar1 – Xbar2)^2 ] / [ (n1+n2-1)*(n1+n2) ]

mean=[n1 /(n1+n2)]*Xbar1 + [n2 /(n1+n2)]*Xbar2

variance=[ n1^2*Var1 + n2^2*Var2 – n1*Var1 – n1*Var2 – n2*Var1 -

n2*Var2 + n1*n2*Var1 + n1*n2*Var2 +n1*n2*(Xbar1 – Xbar2)^2 ] / [ (n1+n2-1)*(n1+n2) ]

there are sample size \( n_i \) for each group

Furthermore suppose you already got the

sample mean estimate

\( \bar{X_i} = \frac {\sum_{j=1}^{n_i}X_{ij}} {n_i} \)

and the sample variance estimate

\( \hat{\sigma}_i^2 = \frac {\sum_{j=1}^{n_i}(X_{ij}-\bar{X_i})^2}

{n_i - 1}\)

for the each group, i.e. \( i = 1, 2, ..., r\)

Then the pooled sample mean \( = \frac {\sum_{i=1}^r\sum_{j=1}^{n_i}X_{ij}} {\sum_{i=1}^rn_i}

= \frac {\sum_{i=1}^rn_i\bar{X_i}} {\sum_{i=1}^rn_i} \)

and the pooled sample variance \( = \frac {\sum_{i=1}^r

\sum_{j=1}^{n_i}(X_{ij}-\bar{X_i})^2} {\sum_{i=1}^r(n_i - 1)}

= \frac {\sum_{i=1}^r (n_i - 1)\hat{\sigma}_i^2}

{\sum_{i=1}^r(n_i - 1)}\)

It would be the same if you got the data in the form of the sufficient statistics

\( \sum_{j=1}^{n_i}X_{ij}, \sum_{j=1}^{n_i}X_{ij}^2 \) in each group i

In my case the mean value is the same and only variances change

here are some typical examples for my study

1) N(119,3)

2) N(119,12)

3) N(119,8)

4) N(119,30)

I would like to thank you in advance for your help

Best Regards

Alex.

there are sample size \( n_i \) for each group

Furthermore suppose you already got the

sample mean estimate

\( \bar{X_i} = \frac {\sum_{j=1}^{n_i}X_{ij}} {n_i} \)

and the sample variance estimate

\( \hat{\sigma}_i^2 = \frac {\sum_{j=1}^{n_i}(X_{ij}-\bar{X_i})^2}

{n_i - 1}\)

for the each group, i.e. \( i = 1, 2, ..., r\)

Then the pooled sample mean \( = \frac {\sum_{i=1}^r\sum_{j=1}^{n_i}X_{ij}} {\sum_{i=1}^rn_i}

= \frac {\sum_{i=1}^rn_i\bar{X_i}} {\sum_{i=1}^rn_i} \)

and the pooled sample variance \( = \frac {\sum_{i=1}^r

\sum_{j=1}^{n_i}(X_{ij}-\bar{X_i})^2} {\sum_{i=1}^r(n_i - 1)}

= \frac {\sum_{i=1}^r (n_i - 1)\hat{\sigma}_i^2}

{\sum_{i=1}^r(n_i - 1)}\)

It would be the same if you got the data in the form of the sufficient statistics

\( \sum_{j=1}^{n_i}X_{ij}, \sum_{j=1}^{n_i}X_{ij}^2 \) in each group i

For example, if I have 2 people, and A is 4' tall with 3 measurements, while B is 7' tall with 27 measurements, the mean height is:

mean(A,B) = [3/(27+3)]*4 + [27/(30)]*7 = 6.7 feet.

Very odd indeed

mean(A,B) = [3/(27+3)]*4 + [27/(30)]*7 = 6.7 feet.

Very odd indeed

Download or read my document at md.rmutk.ac.th/file.php/471/to-my-students.pdf or here if the site does not allow you.

Last edited:

I would also be very interested in a general equation (and a name of this method) to calculate the variance (as shown by Dragan). As I understand, the equation presented by BGM isn't the same since variance between the mean values is not considered?!?

Cheers,

Jo

I wonder if you guys could help. I'm a scientist and looking to present my research.

I'm measuring two linked values - substance production and cell number, and present research as a value of substance produced per cell - I have experimental data where I have 25 observations each for several different conditions, measuring amount of substance produced and number of cells per reaction (this varies depending on the condition, so not constant), each giving a mean and standard deviation - I then take mean values from each set of observations to give mean substance production / cell. However, I would also like to be able to present the standard deviation for the substance/cell value - I'm sure there must be an equation to let me combine the standard deviations of substance production and cell number to give an overall standard deviation, but don't know what this is! Can anyone help?

Many thanks.

Just for clarification purposes: You grew n batches of cells each under different conditions with 25 observations for cell number and substance produced each. Subsequently, you took the mean and standard deviation for cell number and substance produced for each batch, and calculated the quotient to derive the mean substance produced per cell for each batch?!

If you want to calculate the standard deviation for this quotient you have to apply the rules of error propagation. For multiplication and division the rule is as follows:

If c = a * b, or c = \frac{a}{b}

then \frac{\sigma_{c}}{\left | c \right |} = \sqrt{\left( \frac{\sigma_{a}}{a}\right )^{2} + \left(\frac{\sigma_{b}}{b}\right )^{2}}

Also have a look here: http://en.wikipedia.org/wiki/Propagation_of_uncertainty

Hope that helps!

Can I cite something formally and what should I refer to the result as that is understood by the statistical and scientific community? I have been referring to the value as the overall standard deviation, but is there a formal name? Is joint standard deviation used? I could not find a convention for naming this value and could not find this equation in any text book. I know these posts are going back awhile but I appreciate the help! I really need to provide a reference.