Comparing data from multiple samples - which model?


My job is to measure electrical characteristics of transistors e.g. onset voltage or threshold voltage. Every wafer I test has roughly 4000 transistors, and I measure many characteristics for each.

I perform weekly experiments where I change a variable. I want to know if variable has had a significant effect. Normally I would have numerous wafers (with 4000 transistors on each) with and without the variable change. My current stat analysis is poor: calculate mean and SD of each characteristic, then average the means for like-for-like wafers and compare "with variable A" against "without variable A" to see if the means are further apart than 2 or 3 SDs. So for example I would have in its simplest form:

With A = sample 1, sample 2
Without A = sample 3, sample 4
(each sample has the 4000 transistors which i measure)

If I do t-tests to measure the significance in change (for any characteristic) I invariably get a tiny p value (<10e-16), maybe because I have so many measurements i.e. the means are really different. But to do a t-test I have to combine all the data from "with variable A" into one set (instead of say two), and the same for "without variable A". Therefore it misses the variation between samples which I can see would make the significance in change low. Anova seems to be used if I want to compare multiple variables...but I only have one variable but with many samples of each which is what the t-test doesn't take into account.

My data is normal looking by eye but tests (such as Shapiro) suggest it is not.

Should I be looking at reading up on ANOVAs? If not could someone start me on the right path to which statistics I should research.

Many thanks in advance.



TS Contributor
Several comments:
First, your sample size is so large that any normality test will fail. You are better off judging the normality visually on a normal probability plot, Q-Q plot etc.

You could run a 2-way ANOVA using wafer as a second factor (with A being the first factor). This will tell you if there are significant wafer to wafer differences. However, your samples size is again so large that it will almost assuredly show significance on all factors.

You may want to simply determine the minimum effect size that is of practical significance to you and use it in addition to the p-value.
Thank you for your reply.

So to clarify, do you mean that I should correlate effect size from an 2-way ANOVA to how much (I think) the change matters practically? Will the 2-way ANOVA give me two numbers: one for significance of wafer to wafer variation, one for the significance of the variable Ive implemented (A)? Is it it then as simple as saying that (although both factors are significant) if the significance of the wafer to wafer difference is larger than the significance of the difference of factor A, then, for my purposes, factor A has not had a significant effect? Or am I grossly over simplifying?

Thanks again



TS Contributor
The 2-way ANOVA will separate the effect of wafer to wafer from the effect of factor A (you could also test for the interaction between the two). It will test the significance of each effect using the within wafer variation. You will end up with an understanding of the variation within wafers, variation between wafers and variation due to factor A.

Regarding your last set of questions, replace "significance" with "effect" then you have the essence of it. If the effect size of between wafers is larger that the effect size of factor A then (while statistically significant) you will have difficulty seeing the effect in practice. You will see a slight mean shift over time, but you are probably looking for a more dramatic effect.

If you are more interested in practical effects that are larger than the between wafers variation, use the 1-way ANOVA. This will test the significance of the factor A effect using the combined variation of between/within wafers. This would require the effect due to factor A to be large enough to be seen over that variation. This sounds more like what you need.
Hello Miner. I come back to you for more advice. I followed your last bit of advice and ran both two and one way ANOVAs on my data sets. Unfortunately, every variation is so significant the I get a p value < 2e-16 so I cant say one is more significant that another.

I find it strange as boxplots of the data suggests the data is not so different (see image). I am running my ANOVA in R. Maybe I have not set things up correctly? Should I have 4998 residuals? I have 5002 data points, half will be Peel A and half will be Peel B. Peel A is split into two wafers. Peel B is split into two (different) wafers.

Maybe I need to explore another statistical tool? Or maybe I havent sorted things out correctly?

Any help is greatly appreciated !

Last edited:


TS Contributor
Just to verify, is Peel:Wafer the interaction term? And, I could not see/open the image. Check to see whether it is a supported format.

This is consistent with what you describe. While the difference is statistically significant due to the large sample size, the wafer to wafer variation drowns it out in background noise (see relative contributions attached). You should see this difference eventually as a mean shift in your process, but it will not be a striking difference.
Here is a URL to the image:

I think I understand: Because both p values are low, the wafer to wafer to variation is more than (or of same order as) the effect size of Peel. Therefore I can conclude that while Peel is significant (due to size of data set) wafer to wafer variation is similarly significant and so drowns out the effect of Peel.

Are those figures you gave me part of the ANOVA. If i read up on ANOVA relative contributions will it explain the table you attached?

Thank you

Ok I have been reading up on effect size. If im right the table you attached gives the relative contribution of the effect size for Peel, Wafer and P*W (which ive decided to remove because it doesn't make sense - there are no wafers with both Peel and not Peel).
It seems to be that for me any of the effect size markers can be used (do you agree?). I have looked up eta squared and the formula is SSbetween/SStotal. So am I correct in saying that, considering both factors are significant (low p), if my eta squared (and hence effect size) is larger for Wafer i.e. between wafers, than my eta squared for Peel i.e. between experimental factor, then the effect of my experimental factor is less than the noise of wafer variation?


TS Contributor
Still can't see the image. Our security blocks storage sites.

Yes, the factor effect is less than the noise of the wafer variation.
Hello again Miner

I have another question if you dont mind. I have been using your advice from above to run 2 way ANOVAs as discussed. I have been using eta^2 as the effect size to see which element contributes the most variation in my experiment. If the effect size of my experimental factor is larger than the effect size of the wafer to wafer and within wafer variation i consider the experimental factor to be significant. I wonder, should I be considering the mean square values? As that is the sum square divided by the degree of freedom hence does some kind or normalization? Again then if my mean square is bigger for my experimental factor than wafer variation I should take note. Or should I stick with eta^2?




TS Contributor
Since eta^2 is based on mean square, you should reach the same decision either way. If you are comfortable working with mean square, you can definitely do it. However, if you have to explain what you are doing to someone else, eta^2 is much easier to interpret.
Is eta^2 not: sum squares between / total sum squares? Hence it doesn't take into account the degrees of freedom (not a statement, a question). Sorry about the confusion I just want to get it clear in my head.
Thanks again


TS Contributor
You are correct. I typically use epsilon^2 which does use the mean square and df. Eta^2 has a positive bias from using the sum of squares. However, as you can see from post #7 the difference is relatively small. Unless you are splitting hairs and straining at gnats, this difference should not affect your decisions. In industrial statistics differences that you would spend real money on tend to be rather big.