The people who sent me the data have a record for each of the 1027. They included a "sample weight" column which indexes the demographics columns. So, if a certain combination of demographics (age, sex, county, income) is under represented in the sample as compared to the Population, then their sample weight is larger than 1. The group who sent me the data suggested that I use the sample weight and not counts.

I broke the 1027 into 3 groups:

Group A: those who saw our ad

Group B: those who were online at a site where we advertise but did not see our ad

Group C: those who were not online at a site where we advertise

Below are the donation rates and the min and max ranges based on 1 standard deviation. This is using the sample weight column (not the counts)

Group count donators donate rate std dev 68min 68max

A 121 15 12.4% 3.0% 9.4% 15.4%

B 270 28 10.4% 1.9% 8.5% 12.2%

C 636 64 10.1% 1.2% 8.9% 11.3%

Total 1027 107 10.4% 1.0% 9.5% 11.4%

I used ((p)(1-p)/n)^0.5 to calc the standard deviation. Is that correct?

Since groups A and B ranges don't overlap, my thought is that we cannot say that the ads drop a significant difference at 68% confidence (1 standard deviation). Is that the correct interpretation?

I could not find how to calculate the confidence level of what we could use to say the difference was significant (we are A% sure that group A is better than group B). Where can I find this information? I apologize if my searches missed something obvious. I'm also interested in identifying which sample sizes would enable me be x% confident. I guess I could keep inputting counts until I see it. I'm thinking that there is a more efficient way.

Thank you.

P.S. I feel like this post is already too long especially for a Noob. If interested, other questions are below. Thanks again

1. Am I correct using the sample weights as opposed to the counts?

2. I guess I could compare group A to B and C combined since B and C are similar. If I wanted to compare all 3 groups would I use an ANOVA?

3. How can I tell if something else besides the online ad drove the donation? For example, we send emails asking for donations. What if it was the email(s) or demographic differences between the groups? Is this called 'confounding'?

4. If it was something else, how can I quantify the relationship between the donation and the other item(s)? For instance, "donations were primarily driven by X and for every y% increase in X, the chance to donate increased by z%. Sounds like I need to go into regression and logistical regression at that since the output variable is binomial (correct?). But I bet I could do regular regression is I use the donation amount as opposed to the binomial donation flag.