# simple logistic regression question

#### alphaqn69

##### New Member
I'm doing a willingness to pay survey to see if people are willing to spend $1 for clean water. I randomly sampled 100 people and one of the questions on the survey asked "do you care about your health", followed by the other question "are you willing to spend$1 for clean water?"

If I want to show a correlation between people who care about their health and their willingness to pay $1 extra, can't I code the responses as (1,0) for both questions and do logistic regression? Or should i just do proportions? #### Dason ##### Ambassador to the humans Re: simple log regression question If I want to show a correlation between people who care about their health and their willingness to pay$1 extra, can't I code the responses as (1,0) for both questions and do logistic regression?
There would really be no need to do logistic regression when you could just do cross tabulation and use some contingency table results.

But you might also want to spend more time working on survey design next time. I see a couple flaws based solely on the little information you've given us so far.

#### alphaqn69

##### New Member
Hi Dason! that's exactly what I ended up doing, and just used the proportions from the 2x2 contingency table that was generated.

Question2:
Why can't "caring about health" be used as a predictor for "willingness to pay $1 more"? (response variable). Second question: What did you observe that was not good about the survey design? Six months went into preparing and I summarized it in three lines, so I hope you weren't put off by my condensed version. If there was something worth considering please let me know, and you will be cited in the paper. #### Dason ##### Ambassador to the humans You said that the question "Do you care about your health" always came right before the question about paying an additional dollar. It just seems to me that this could modify a person's answer to paying an additional dollar. For instance maybe a person doesn't want to pay that additional dollar. But you ask them if they care about their health, they think about it for a while, convince themselves that they really do care about their health (or at least they want to believe they care about their health) and the very next question asks if they would be willing to pay$1 for "clean" water. They were just forced into a mind set where they want to believe they care about their health and now they're being asked a question that seemingly reflects whether or not they care about their health.

It just seems to me that by always preceding the question about \$1 clean water with the question about caring about health that you might change a person's response. So you might find that the correlation you find is actually higher than the true correlation. There are ways to try to avoid such an effect.

Now I can't claim to be an expert by any means when it comes to survey design but this is something that does stick out to me.

#### alphaqn69

##### New Member
Hey thanks! The second question did not follow sequentially. It was almost at the end of the survey. Sorry if I confused you. You're right in your analysis. It's called "priming" and worked because people naturally don't like to be perceived as being inconsistent in responses.

Why would you prefer using contingency tables versus logistic regression?

Thanks bud.

#### DavidBill

##### New Member
Hmmm, well, I think they're going to tell you different things. A crosstabs table with an appropriate measure of association like phi (which is the same as pearson's r in this case) will give you a measure of the strength of the relationship. (This is a linear measure, albeit with an odd fit to the scatterplot.) Logistic regression coefficients aren't directly interpretable, so this will give you odds ratios or the probability of getting a "1" on the dependent variable.

Interesting points about surveys here too. The point about question order is well advised. While the effect of a prior question on a subsequent question does involve priming, the result would probably be labeled as a "reponse effect." For this reason, questions on more stable matters like demographics are usually placed last. They're less likely to be affected by this than, say, opinion responses (while the reverse is less true). It's good to separate related questions, if possible, and if you're using a CATI program, you can randomize question order as desired, which also can help.