How do I use non-linear continuous variables in a logistic regression?

Hey guys.

I need a little help understanding how to use continuous variables in a logistic regression.

Specifically, what if my continuous variable is non-linear?

Take a look at this blog post:

You can download the spreadsheet that goes with that post there as well. For example, height is a continuous variable. Or income.

Let's say in the above link's spreadsheet I knew the income of each person, but it showed a nonlinear relationship when plotted against y. How do I transform the variable? How do I know what to transform it by?

I'm having trouble finding a clear explanation of this online, so any help would greatly be appreciated.

Cheers. :)
What is the equation for non linear part?

(I guess that it is difficult to plot it since it is a 0/1 response variable. Or can you? )

I am not happy about downloading unknown sources, and I believe that most people here feel the same.

Is this from real data that you investigate or is it homework or an assignment?
It's something I am working on.

Basically, imagine I have thousands of entries of the data as in my attachment. I am attempting to predict if you will have a stroke before you are 40 (crazy example, totally made up data).

So to set up a logistic regression I know how to many binary variables for owning a pet, gender and college grad or not. I understand that part of the math.

What I don't understand is how to handle continuous variables like "hours exercised per week."

For example, what if we knew that the more hours you exercised the less likely you were to get a stroke? And it was constantly linear? So for the sake of argument, what if exercising 40, 60, 80 hours a week just kept lowering your chance of a stroke at the same rate (crazy example I know just assume with me)?

Well, in that case, I can see how the coefficient my logistic regression would produce for "hours exercised per week" would make sense.

But a more likely scenario is that exercising a few hours a week is very good for reducing stroke, but then if you hit a point where the more you exercise, the more it actually INCREASES your chance of stroke (again, made up example, but lets suppose we know its true).

So when we know (or suspect) that a continuous variable like "hours exercised per week" is nonlinear, how does this affect our logistic regression? I assume it would make it less accurate at predicting than if the continuous variable was linear.

How can we structure our logistic regression model in such a situation? I read something about transforming the non-linear independent variable.

Does the linearity of the continuous variable in a logistic regression even matter?

There are the kinds of questions I have. I know its a a lot, but I can't find a good answer or explanation of how to account and think about non-linear independent variables in a logistic regression.

Thank you all for reading/helping.:tup:
Do you know how to write down such a logit model? Try to do that!

Why don't you try to estimate a model from your own "data" (maybe made up data)! What software do you have access to?

If you have the following explanatory variables:

pet + gender+ college + exercise

Then a very simple form of including a non-linear effect is to include a squared term:

pet + gender+ college + exercise + exercise^2

(This model is non-linear in the variables but still linear in the parameters so you can still estimate it with the same software.)