# Check if dependency between two variables is linear

#### Denis

##### New Member
I've posted the question:
https://stats.stackexchange.com/que...near/424215?noredirect=1#comment791817_424215
There are a number of values for dependent variable (let's name it Y) and the same number of corresponding values for independent variable (let's name it X).
Below is just toy example:

X=2,4,7,11,15,20,25,30,33,42,45,50,55,60,70
Y=0,0,0,0,100,100,200,200,200,500,500,900,950,950,1000

How can i check if dependency Y(X)is linear?

In addition, i have another theoretical question. If my independent variable (X) is binary, i.e. takes only two values 0 or 1, but Y is discrete (e.g. takes the same values from the example above). Is it possible, that dependency Y(X) is linear? Why?

One from the replies to my question was:

A different interpretation of "linearity" is that alternative non-linear models aren't worth the additional complexity. There are two standard, textbook approaches to this: add a quadratic term or bin the independent variable(s). Run an ANOVA on the nested model. If it's not significant, conclude you haven't detected any nonlinearity. These are often called "goodness of fit" tests

Unfortunately the reply was not well clear for me. Are there good explanation and tutorial for these two methods (add a quadratic term and bin the independent variable) (if possible in r)? What i already understand i have to make some linear models (perhaps with lm function) and then test them with ANOVA. How many models? It's not clear which models i could make with one independent variable? Could you help please?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
The best way to examine linearity is construct a scatterplot of data and visualize the relationship. You can also fit a line to the data and see if it has reasonable fit and the residual appear to have not pattern, using linear regression.

What I like to do, which isn't as intro level, is fit a spline to the data and see how many degrees of freedom it has (knots). Another basic option is to fit a Loess curve and the shape of the line when playing around with the smoothness feature. What the other person was saying, is that you can add terms to the linear regression model. Polynomials, so X^2, or X^3 and see if they better explain the dependent model. You are able to perform test to compare nested model (linear regression models). So compare y = x versus y = x + x^2. This is what they were referencing.

If X is binary than the model produces a linear relationship, it is like a one unit increase in the X1 model, much like when the IV is continuous. See below:

where DV and IV linear: When DV linear and IV is categorical: Let me know if this is confusing, I wrote it all pretty quickly.

#### GretaGarbo

##### Human
OP said this was a toy example.
Where did he say so?

If so, I really dislike this kind of hypothetical fluffyness. Show us the real stuff!

#### Denis

##### New Member
The best way to examine linearity is construct a scatterplot of data and visualize the relationship. You can also fit a line to the data and see if it has reasonable fit and the residual appear to have not pattern, using linear regression.

What I like to do, which isn't as intro level, is fit a spline to the data and see how many degrees of freedom it has (knots). Another basic option is to fit a Loess curve and the shape of the line when playing around with the smoothness feature. What the other person was saying, is that you can add terms to the linear regression model. Polynomials, so X^2, or X^3 and see if they better explain the dependent model. You are able to perform test to compare nested model (linear regression models). So compare y = x versus y = x + x^2. This is what they were referencing.

If X is binary than the model produces a linear relationship, it is like a one unit increase in the X1 model, much like when the IV is continuous. See below:

where DV and IV linear:

View attachment 1303

When DV linear and IV is categorical:

View attachment 1304

Let me know if this is confusing, I wrote it all pretty quickly.
Hi,
Are you an R user? I'm asking, because may be it would be easier for me to understand the concept using R code examples. The regression Y on X2 is not linear (from the plot 2 you provided it doesnt look like linear, i.e. the majority of the points are far from the line). So it's not possible to obtain a linear dependency for binary IV and discrete DV. Am i right?

#### Denis

##### New Member
Where did he say so?

If so, I really dislike this kind of hypothetical fluffyness. Show us the real stuff!
Hi,
Thanks for your response and typo you found. I'm working with very large data sets and it's not possible to post it here, but the example mimics the real data i have. Sorry for the misunderstanding.

#### GretaGarbo

##### Human
The regression Y on X2 is not linear