2 sample z test - Standardized Effect Size

#1
Hi,

comparing the average of 2 groups when we know each group's standard deviation: σ1, σ2

Cohen’s effect Size = | avg(x1)- avg(x2)| / σ pooled

How do you calculate the σ pooled for 2 sample z test?

σ pooled^2= (n1σ1^2+n2σ2^2)/(n1+n2) {similar to t-test}


or just a simple average σ pooled =(σ1+σ1)/2 ?
 
Last edited:
#5
Hi hlsmith,

Per my understanding, the book describes the: 2 sample t test - Standardized Effect Size.
When you use the sample standard deviation:

S pooled ^2= ( (n1-1)S1^2+(n2-1)S2^2)/(n1+n2-2)

The question is how to calculate the 2 sample z test Cohens effect size, when you know the standard deviation of the two groups are: σ1,σ2.

The book treats only the case that σ1=σ2: effect size=(μ1-μ2)/σ

I know the basic assumption of "pool" idea is that the standard deviations are the same but only the sample standard deviations are not the same, and in this case, we know that the standard deviations are not the same, so probably the average of the standard deviations? or the average of the variances is more appropriate?
 
Last edited:
#7
No, I agree - if you need to address that it needs some type of weighted pooling. I would imagine that should be a formula out there. You didn't see anything in that book? I don't recall ever doing that process myself. If I was you I would continue combing the web to confirm your approach or find a formula. It seems like it would be a common process in meta-analyses.

:)
 
#8
I think the satterthwaite formula for calculating a t-test with unequal variances might be useful because the gist is to pool and use the appropriate (approximate) degrees of freedom. This is just of the top off my head.
 
Last edited by a moderator:
#9
No, I agree - if you need to address that it needs some type of weighted pooling. I would imagine that should be a formula out there. You didn't see anything in that book? I don't recall ever doing that process myself. If I was you I would continue combing the web to confirm your approach or find a formula. It seems like it would be a common process in meta-analyses.

:)
Thanks hlsmith,

Thanks for the book suggestion, I didn't find anything related to the option you know the two samples standard deviations.
It is always sample standard deviations from other sources. probably z test is much less practical ...
I guess the z test was used more when people used tables for calculations, z table is more detailed with no df ....

of course, I searched the web before asking the question ..., but the magic word "meta-analyses" gain more results.:)

I found one video that suggested : (σ1+σ1)/2
and another place: SQRT(Treatment group variance + Control group variance)/2) https://www.creative-wisdom.com/teaching/WBI/es.shtml

So probably the average of the variances is the correct one ...
 
#10
I think the satterthwaite formula for calculating a t-test with unequal variances might be useful because the gist is to pool and use the appropriate (approximate) degrees of freedom. This is just of the top off my head.
Hi Ondansetron :)

The question is about z-test.
Do you mean using the same formula used for the t t-test to the z-test?

I think, but not sure that the average of the variances is the correct answer.
 
#11
Sorry, I saw the t-test part.

1) how do you know the variances?
2) I might be wrong, but I think the variance of a sum (or difference) is just the sum of the variances, assuming the two variables in the sum or difference are independent. For non-independent variables you need to adjust for the covariance between them. @Buckeye @Dason are going to be able to clarify better, but in my mind, it's irrelevant that it's a "Cohen's" standardized difference and really just a problem of solving for the variance of a difference (which is similar to the variance of a difference).
 
#12
I guessed you saw t-test :)

Cohen's effect size uses the population's standard deviation, not the statistic's standard deviation.

The population standard deviation when you mixed 2 groups depends on the number of items from each group in the entire population (not in the samples).

But I believe the entire population is not relevant when you compare 2 groups, and it should be a mixture of 50% 50%.
So the answer is: sqrt ( (variance1+variance2) / 2 )
 
#13
Variance of a sum (or difference) of two independent random variables is the sum of the variances: Var(X+Y) OR Var(X-Y) = Var(X) + Var(Y). You would then take the square root of Var(X-Y) to get the SD(X-Y).

Maybe I'm missing something here...
 
#14
The calculation is for the population standards deviation, and not for a combination of 2 random variables (X+Y) or (X-Y)

for example, if taking a specific case when variance1=variance2=8, the population variance should be 8, and standard deviation sqrt(8)
 
#15
I'm not an R aficionado, but here is a quick simulation to show that it is incorrect to simply average to get the pooled SD for two independent random variables.

Code:
> set.seed(123)
> x <- rnorm(100, 2, 5)
> y <- rnorm(100, 6, 10)
> diffxy<-x-y
> SD(x)
Error in SD(x) : could not find function "SD"
> sd(x)
[1] 4.564079
> sd(y)
[1] 9.669866
> sd(diffxy)
[1] 10.89538
> diffxy<-(x-y)
> sumxy<-x+y
> sd(sumxy)
[1] 10.48642
> var(x)
[1] 20.83082
> var(y)
[1] 93.50631
> var(diffxy)
[1] 118.7092
> set.seed(1234)
> a<- rnorm(100, 2, 2)
> b<- rnorm(100, 5, 2)
> diffab<-a-b
> sumab<-a+b
> var(a,b, diffab, sumab)
Error in var(a, b, diffab, sumab) : invalid 'use' argument
In addition: Warning message:
In if (is.na(na.method)) stop("invalid 'use' argument") :
  the condition has length > 1 and only the first element will be used
> var(a)
[1] 4.03532
> var(b)
[1] 4.261643
> var(sumab)
[1] 8.086441
> var(diffab)
[1] 8.507485
Code:

I even left in my error messages to show how bad I am at R, but the point remains clear. The first case with X and Y has different variances (assigned SD of 5 and 10, var of 25 and 100 to x and y, respectively) and shows you the variance is additive between the two. The second case is A and B with the same variance (SD) assigned (SD of 2, var of 4) to show you the pooled variance is again additive (equals 8 rather than 4 as you claim).

I'm not sure if there is a miscommunication but I think I can't really make any other comments. The numerator is the difference in means of a random variable and the appropriate variance/sd would be the sum of the variances (again assuming two independent RVs). You just account for the covariance if they're not independent.
 

spunky

Doesn't actually exist
#16
But I believe the entire population is not relevant when you compare 2 groups, and it should be a mixture of 50% 50%.
So the answer is: sqrt ( (variance1+variance2) / 2 )
If you're assuming a Gaussian mixture then your answer is wrong UNLESS you further assume that the population means are zero. The general expression for the variance of a mixture is:

\(\sigma^{2}=\sum_{i=1}^{k}p_i(\mu^{2}_i+\sigma^{2}_i)-\mu\) (where \(\mu\) is the grand mean of the mixture)

So if you have a two-component mixture, the variance is:

\(p_1\sigma^{2}_{1}+p_2\sigma^{2}_{2}+[p_1\mu^{2}_{1}+p_2\mu^{2}_{2}-(p_1\mu_1+p_2\mu_2)^{2}]\)

To get the expression that you posted, everything in the square brackets should be 0 and that can only happen if the population means are 0.
 
Last edited: