# Difference in population proportions analysis

#### coreyelowsky

##### New Member
Hello,

Here is my problem.

I have 2 groups of brains (male and female). Each group has about 6 brains. We are interested in analyzing the proportion of cells in one specific regions of the brains. So for each group, I have a list of proportions from each sample brain as shown below.

Males
Brain 1 - .1
Brain 2 - .13
Brain 3 - .9
Brain 4 - .6
Brain 5 - .7
Brain 6 - .8

Females
Brain 1 - .2
Brain 2 - .23
Brain 3 - .18
Brain 4 - .2
Brain 5 - .24
Brain 6 - .25

We want to be able to compare the difference in proportions between the two groups but I'm not sure how this can be done. If i just had one sampled brain from each group I would do a simple difference in proportions p test, but Im not sure how this should be done considering I have multiple sample proportions from each group?

#### obh

##### Active Member
Hi Core,

Proportion test is about the probability of an event, for example, the proportion of male baby.
For example in hospital A, 66 babies from 101 are males. so the proportion of males is 66/101=0.65
It is binomial distribution while each baby is Bernoulli variable: male or female

Per my understanding, this is exactly not your case! the word proportion is misleading ...
You don't treat every cell as a Bernoulli variable, but you treat each brain as a variable. Am I correct??? say the sample size is the number of brains not the number of cells.

So if I'm correct, you can treat each brain proportion as a result, as a person test result, and ignore the word proportion.
So you can run a test to compare the average proportion between male and female, like t-test or Mann Whitney U test, depend on what test assumptions you meet.

How many brains? is this example or your real data?

#### coreyelowsky

##### New Member
This is just example data. There are about 6 male brains and 6 female brains.

So if I understand you correctly, you are suggesting to find the average of the male proportions, and the average of the female proportions, and then use a t-test to compare the difference between the two. However, my one concern here is I am confused as to whether I will be using a t-test to compare differences in means, or differences in proportions, because each will have a different standard error. By doing what you have suggested, its seems like what I calculate will be a mixture of both a sample mean, and also a sample proportion? Therefore I am not exactly sure how to handle it.

Thanks

#### obh

##### Active Member
Hi,

I think you are confused again (or maybe me)

When you use a proportion test, in the above example (66/101=0.65), the sample size is101 observations.
Then you can calculate the standard deviation of the proportion, which depends on the proportion.

So, my assumption is that you need to treat each brain as one sample value, and not each cell in the brain,
So your sample size will be the number of brains (6) and not the number of cells. so each brain is counted as 1 observation.

Do you agree to this assumption?

So per my understanding, you should check the distribution of your proportions (when each proportion is one value) and if it is similar to normal or symmetrical, you can use the Welch's t-test, assuming unequal standard deviations for males an females, but also good for equal standard deviation, otherwise, you may use the nonparametric test: Man Whitney U test.

#### coreyelowsky

##### New Member
I agree with your assumption. However after doing more research into those tests, I realize that our data is not normal, so therefore I cannot use Welch's t-test, and since my data are proportions and not counts, i cannot use the man whitney U test since I believe that it is only used for ordinal data.

Any thoughts?

#### obh

##### Active Member
Proportion is ordinal ...
Per my assumption, you may think of it as a mark in school exam

#### coreyelowsky

##### New Member
I believe ordinal refers to natural or counting numbers, where as I have decimals between 0 and 1

#### obh

##### Active Member
Ordinal data say you can sort the data by value.
Generally, it is a categorical variable that can be ordered (like Likert scale), but you can use MWU also for continuous variables

#### GretaGarbo

##### Human
Proportion test is about the probability of an event, for example, the proportion of male baby.
When statisticians hear "proportion" they tend to think of the binomial distribution, a distribution for a discrete variable.

But a proportion can also be a share of something, like share of A of (A+B). proportion = A/(A+B). Like your share of expenditures on food.

Then a beta distribution could be useful. (Then you can do maximum likelihood estimates of that, just like you can for the normal distribution.) There are many softwares for this.

i cannot use the man whitney U test since I believe that it is only used for ordinal data.
The Wilcoxon-Mann-Whitney test (WMW) can be used for ratio scales, interval scales and ordinal scales (By the way, a ratio scale is of course also an ordinal scale.) The data for WMW are assumed to be continuous. (But many confuse the word "continuous" and "ratio scale")

But of course you can do many tests. Try a t-test, WMW, a permutations test, a likelihood ratio test base on the beta distribution or an other distribution, or empirical likelihood or whatever.

#### coreyelowsky

##### New Member
Ah makes sense. I think I will try a WMW test. Thanks for your help

#### GretaGarbo

##### Human
Try several tests. If the conclusion is the same for every test, then it is a fairly robust conclusion. But if the tests gives different results, then it is not conclusive.

By the way, WMW is sensitive to spread, so when the population mean value is the same and the "spread" is different, then the test can indicate "significance" (in a higher frequency than the nominal 5%).

#### obh

##### Active Member
Hi Greta,

Thank you for the valuable answer (as usual)

When statisticians hear "proportion" they tend to think of the binomial distribution, a distribution for a discrete variable.

But a proportion can also be a share of something, like share of A of (A+B). proportion = A/(A+B). Like your share of expenditures on food.

Then a beta distribution could be useful. (Then you can do maximum likelihood estimates of that, just like you can for the normal distribution.) There are many softwares for this.
Thanks for the correction! as we don't count the number of cells it isn't binomial...
But many confuse the word "continuous" and "ratio scale"
I assume the "ratio scale" and "interval scale" can be related to the continuous variable or to the discrete variable.
For ratio scale example: 2m snake twice longer than 1m snake in both cases:
continuous variable: laser measurement tape with 10 digits.
discrete variable: regular measuring tape that has only centimeter resolution.

By the way, a ratio scale is of course also an ordinal scale.
I assume the "ordinal variable" is a type of categorical variable (discrete)
While "ordinal scale" is related to any sortable data discrete or continuous

The data for WMW are assumed to be continuous
I assume can be done for any "ordinal data" continuous or discrete. (also as I remember it was originally meant for one of them)