# Modelling a variable

#### GermanGirl

##### New Member
Hi everybody,

my problem is a variable (age) that I wish to enter into linear regression - but, the variable is not linear. So, I want to model it. But, it doesn't work so far. I already tried: log10, exp, ², √, and /1.

Is there anything else I could try? I already talked to a statistican and she had no more ideas, so this is probably a though one... :shakehead

The boxplot shows the variable in relation to the target variable. There is a sharp effect (rapid increase) during young ages and almost no effect of the variable (age) on the target variable in later years. Of course, I could just compute two separate models, but this is just my exit strategy. I'm using SPSS.

Looking forward to many brillant ideas!

#### Karabiner

##### TS Contributor
my problem is a variable (age) that I wish to enter into linear regression - but, the variable is not linear.
There is no such thing as a linear variable. Relationships can be
linear. So it isn't clear what you mean to say. What are
the background and the objective of your study, what do you
want to find out?

The boxplot shows the variable in relation to the target variable.
Well, it is not a boxplot but a scatterplot. Anyway,
you obviously have no cases between ages 21 and 30.
As far as I know, you cannot use such a variable as
predictor in a linear regression, but I might be wrong
(and transforming will not solve the problem). If
the variable was continuous you could just add age²
to the model, so maybe you better try to include ages
21-30.

Again, what is this all about?

With kind regards

K.

#### Dason

##### Ambassador to the humans
Anyway,
you obviously have no cases between ages 21 and 30.
As fas as I know, you cannot use such a variable as
predictor in a linear regression, but I might be wrong
(and transforming will not solve the problem).
Can you elaborate on why you think one couldn't use a variable like this?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Regardless of your purpose and agenda, just for fun why don't you try to double square that beast (e.g., ^2 then ^2).

#### threestars

##### New Member
Regardless of your purpose and agenda, just for fun why don't you try to double square that beast (e.g., ^2 then ^2).
What would this accomplish?

#### Dason

##### Ambassador to the humans
What would this accomplish?
I'm interested too. One reason I'm wondering is just that if you're talking about squaring one variable and then squaring it again you can easily accomplish that just by raising it to the fourth power. The other reason is that hlsmith doesn't mention whether they're talking about transforming x or y. My guess would be squaring y but that would also make the variance more heterogeneous.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Come on you all, I said just for 'fun' - its Friday and too close to Cinco De Mayo.

You are correct, my recommendation would not accomplish anything at all, but making it look more like Dason's snow covered back yard.

#### Dason

##### Ambassador to the humans
ON TUESDAY MY WIFE ASKED ME TO TURN ON THE AIR CONDITIONING AND NOW I HAVE SNOW

blarg

#### threestars

##### New Member
Haha, working in applied statistics I've heard stranger things than "double square the variable."

ON TUESDAY MY WIFE ASKED ME TO TURN ON THE AIR CONDITIONING AND NOW I HAVE SNOW
Clearly causal (p < .05).

#### noetsi

##### No cake for spunky
lol

It is not uncommon to talk of a variable being skewed (that is non-normal). I suspect that people confuse linearity with normality since they are commonly discussed the same time in the same classes.

#### Karabiner

##### TS Contributor
lol

It is not uncommon to talk of a variable being skewed (that is non-normal). I suspect that people confuse linearity with normality since they are commonly discussed the same time in the same classes.
But what about the lack of cases with values admidst the age distribution?
Could one model y = b0+b1*age+b2*age²+e regardless of that fact?

With kind regards

K.

#### Dason

##### Ambassador to the humans
But what about the lack of cases with values admidst the age distribution?
Could one model y = b0+b1*age+b2*age²+e regardless of that fact?

With kind regards

K.
Of course you can still fit a model. I was even asking you before why you thought you couldn't. The question then becomes, though, whether it's wise to extrapolate the results of the model to areas where you don't have data. In this case it would probably be fine but you would want some more substantial theory to back that up. I mean regardless of the data we collect there will be 'gaps' between the predictors so if we couldn't use our model to help fill in those gaps then most models we be really useless.

#### bukharin

##### RoboStataRaptor
It would be helpful to know what your "target variable" actually represents. Is there some theory that would suggest an expected functional form (shape of relationship between your "target variable" and age), or that would suggest an age at which the curve should level out?

You could model this using a spline, eg one with a knot at 20. It looks like you'll have problems with heteroskedastic errors.

#### GermanGirl

##### New Member
Thanks for your replies.

Yes, even with the data gap between 20 and 30 years it can still be used as a linear variable (said the statistican).
I will try the double square thing...

For the "linear variable" issue (Karabiner): I want to enter age into a linear regression as a linear variable (in contrast to ordinal or nominal), this is the name the SPSS programm gives.

#### GermanGirl

##### New Member
Well, double squaring didn't work...