# Not homework, but workwork

#### jriffe

##### New Member
Okay, so I work for a community program. We receive referrals and then attempt to contact clients to participate in our program. We send a couple letters, maybe make a phone call, that sort of thing. The client in question either responds and participates in the program, or does not.

I'm looking at the raw numbers over the course of a year, and my goal is to identify whether or not our processes may be biased against members of various ethnic groups.

I can look at the accept/reject rates as percentages, but that would just be eyeballing the question. It feels kinda like the answer is in confidence intervals, but despite many "events" over the course of a year, I only seem to have 1 data point (the percentage for the year). I could construct a time series, but that doesn't seem like it moves me in the right direction either.

The question I'm trying to answer would be something like this: Out of 2000 people offered the program in 2007, 1000 are Caucasian, 750 are African American, 250 are Asian American. 90% of Caucasians, 85% of African Americans participate and 70% of Asian American participate. For the whole sample, about 86% of people participate. Is the participation rate for Asian Americans significantly lower than the sample mean? Significantly lower than the Caucasian rate? How significant is the difference? Can I draw any sort of conclusion about the population, or only with respect to the sample?

If you can even get me pointed in the right direction, that would be really helpful.

##### New Member
I'm fairly new to stats, but it seems like a Chi-square would be the way to go since you are looking at observed verses expected counts.

#### JohnM

##### TS Contributor
I would agree with the chi-square. Just from a quick look, it appears that Asian-American participation is lower than one would expect......

#### jriffe

##### New Member
it appears that Asian-American participation is lower than one would expect
That was the intention of my entirely manufactured example. My point is to emphasise my uncertainty about how much sample size is affecting things. For example, if there was 1 Martian among the 2000 people, he either participated or did not, giving 100% or 0% participation by Martians. Obviously, we cannot draw a statistically significant conclusion about Martian participation in comparison to the ~86% sample participation rate.

So, I could use a Chi-Squared Test of Goodness of Fit, break the sample down by ethnic group, and use the sample participation rate * the number of clients in each segment to determine the expected value?

So: (856-900)^2/856+(642-638)^2/642+(214-175)^2/214
= 2.262+0.043+7.107
= 9.412

Alpha .05, df = 2, critical value is 5.99

So, we are 95% confident that the three segments do not participate at the same rate.

What other conclusions can I draw from that? The Asian American segment contributes the most to the chi-squared value, which implies that they are the most deviant from the mean, but I could see that by eyeballing the data.

Is there a test I can perform to show how significantly they deviate from the sample? Any other tests that might give me a juicy, useful tidbit of conclusion?

#### JohnM

##### TS Contributor
This comes up a lot. Statistical analysis is strictly a numerical tool to help you discern the "significant" from the background noise - that's it. Sometimes charlatans sell it for more than what it really is. It's really up to the researcher / investigator to find meaning in the numbers and draw conclusions that are relevant to the research question(s).....

#### Xenu

##### New Member
The problem is that differences almost always are statistically significant in large samples even if the practical significance is nonexistent. I prefer making confidence intervals when possible to get an idea of how large the difference is. In this case, I would make simultaneous confidence interval for difference between groups of interest.