Modelling a variable

#1
Hi everybody,

my problem is a variable (age) that I wish to enter into linear regression - but, the variable is not linear. So, I want to model it. But, it doesn't work so far. I already tried: log10, exp, ², √, and /1.

Is there anything else I could try? I already talked to a statistican and she had no more ideas, so this is probably a though one... :shakehead

The boxplot shows the variable in relation to the target variable. There is a sharp effect (rapid increase) during young ages and almost no effect of the variable (age) on the target variable in later years. Of course, I could just compute two separate models, but this is just my exit strategy. I'm using SPSS.

Looking forward to many brillant ideas! :rolleyes:
 

Karabiner

TS Contributor
#2
my problem is a variable (age) that I wish to enter into linear regression - but, the variable is not linear.
There is no such thing as a linear variable. Relationships can be
linear. So it isn't clear what you mean to say. What are
the background and the objective of your study, what do you
want to find out?

The boxplot shows the variable in relation to the target variable.
Well, it is not a boxplot but a scatterplot. Anyway,
you obviously have no cases between ages 21 and 30.
As far as I know, you cannot use such a variable as
predictor in a linear regression, but I might be wrong
(and transforming will not solve the problem). If
the variable was continuous you could just add age²
to the model, so maybe you better try to include ages
21-30.

Again, what is this all about?

With kind regards

K.
 

Dason

Ambassador to the humans
#3
Anyway,
you obviously have no cases between ages 21 and 30.
As fas as I know, you cannot use such a variable as
predictor in a linear regression, but I might be wrong
(and transforming will not solve the problem).
Can you elaborate on why you think one couldn't use a variable like this?
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
Regardless of your purpose and agenda, just for fun why don't you try to double square that beast (e.g., ^2 then ^2).
 

Dason

Ambassador to the humans
#6
What would this accomplish?
I'm interested too. One reason I'm wondering is just that if you're talking about squaring one variable and then squaring it again you can easily accomplish that just by raising it to the fourth power. The other reason is that hlsmith doesn't mention whether they're talking about transforming x or y. My guess would be squaring y but that would also make the variance more heterogeneous.
 

hlsmith

Less is more. Stay pure. Stay poor.
#7
Come on you all, I said just for 'fun' - its Friday and too close to Cinco De Mayo.

You are correct, my recommendation would not accomplish anything at all, but making it look more like Dason's snow covered back yard.
 

noetsi

Fortran must die
#10
lol

It is not uncommon to talk of a variable being skewed (that is non-normal). I suspect that people confuse linearity with normality since they are commonly discussed the same time in the same classes.
 

Karabiner

TS Contributor
#11
lol

It is not uncommon to talk of a variable being skewed (that is non-normal). I suspect that people confuse linearity with normality since they are commonly discussed the same time in the same classes.
But what about the lack of cases with values admidst the age distribution?
Could one model y = b0+b1*age+b2*age²+e regardless of that fact?

With kind regards

K.
 

Dason

Ambassador to the humans
#12
But what about the lack of cases with values admidst the age distribution?
Could one model y = b0+b1*age+b2*age²+e regardless of that fact?

With kind regards

K.
Of course you can still fit a model. I was even asking you before why you thought you couldn't. The question then becomes, though, whether it's wise to extrapolate the results of the model to areas where you don't have data. In this case it would probably be fine but you would want some more substantial theory to back that up. I mean regardless of the data we collect there will be 'gaps' between the predictors so if we couldn't use our model to help fill in those gaps then most models we be really useless.
 

bukharin

RoboStataRaptor
#13
It would be helpful to know what your "target variable" actually represents. Is there some theory that would suggest an expected functional form (shape of relationship between your "target variable" and age), or that would suggest an age at which the curve should level out?

You could model this using a spline, eg one with a knot at 20. It looks like you'll have problems with heteroskedastic errors.
 
#14
Thanks for your replies.

Yes, even with the data gap between 20 and 30 years it can still be used as a linear variable (said the statistican).
I will try the double square thing...

For the "linear variable" issue (Karabiner): I want to enter age into a linear regression as a linear variable (in contrast to ordinal or nominal), this is the name the SPSS programm gives.