P-value of intercept = 0 meaning? Can age be independent var in simple regression?

#1
Hi I recently got into statistics... I am not too sure if this is the correct place to ask, I have included pictures as well
  1. Can I test "age of used vehicle" as independent variable against "selling price" as dependent variable in a simple linear regression?
  2. Is my residual plot as shown weird in any way? because "age" in my dataset only have values from 0 to 25 , I was wondering if it would be okay ( as values generally line up vertically) compared to when i use other variables such as " mileage" where the residual plot seems more scattered and do not line up ...
  3. My p-value of intercept is 0, is that considered significant and okay , or is there something wrong?
Thank you in advance
 

Attachments

hlsmith

Less is more. Stay pure. Stay poor.
#2
Given your sample size, everything seems fine. A q-q plot of residuals would also be nice. The model seems fine given age predicts price not price predicts age.

The significant intercept, just means that the average price of a brand new car (age =0) is not equal to zero. Which makes sense right? You have a couple of cars that don't completely fit the model (larger residuals), but the sample size is reasonable. You could always look up the cars which irregular residuals and try to figure out why and add that variable to the model.

Welcome to the forum.
 

Miner

TS Contributor
#3
The vertical lines are caused by the fact that age is probably in integer form rather than truly continuous. That causes the residuals to group together on the x-axis rather than spread out.
 
#4
Given your sample size, everything seems fine. A q-q plot of residuals would also be nice. The model seems fine given age predicts price not price predicts age.

The significant intercept, just means that the average price of a brand new car (age =0) is not equal to zero. Which makes sense right? You have a couple of cars that don't completely fit the model (larger residuals), but the sample size is reasonable. You could always look up the cars which irregular residuals and try to figure out why and add that variable to the model.

Welcome to the forum.
Thank you! Does this mean my residual plot fulfills the assumption of linearity as well? I am also quite worried about the outlier ( 25 yrs old car) , should it have been removed or kept ? I read online that outliers should not be removed unless its a clear error.
 
#5
The vertical lines are caused by the fact that age is probably in integer form rather than truly continuous. That causes the residuals to group together on the x-axis rather than spread out.
Thank you for the reply.
Such a residual plot is fine right for a simple linear regression?
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
Yeah, you shouldn't remove outliers unless they are erroneous. But in your set, if you have just one car that is 8 years older than the next oldest car - I would drop it and report that if disseminating your results. Also, per @miners comment, the residuals clustered on years is what it is, and unless you know the day it was sold - they will be that way. Not a real issue for the model.
 
#7
Yeah, you shouldn't remove outliers unless they are erroneous. But in your set, if you have just one car that is 8 years older than the next oldest car - I would drop it and report that if disseminating your results. Also, per @miners comment, the residuals clustered on years is what it is, and unless you know the day it was sold - they will be that way. Not a real issue for the model.
Would it be correct to just leave the outlier in but I just write something like "Hey! thats an outlier, an interesting find , hopefully future analysis could be done with bigger sample size to find out more" ?
Edit: or would it be better to remove outliers and re run?
 
Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
#8
You have a large enough sample that it shouldn't come into play - however, at times outliers at the end of a fit can have 'leverage'. I would fit the model with and without it and see if anything changes. If estimates shift and you get a better fit, I would remove and document it.
 

Miner

TS Contributor
#9
Thank you for the reply.
Such a residual plot is fine right for a simple linear regression?
From years of hands-on application in industry, I would say yes, it is fine. Statistical purists will probably disagree, but I have found in the real world, you can safely bend (not break) a lot of assumptions. However, in industrial statistics, I can also verify what works in practice. In more theoretical disciplines, they cannot do that and must adhere more closely to the assumptions.
 

Miner

TS Contributor
#10
Would it be correct to just leave the outlier in but I just write something like "Hey! thats an outlier, an interesting find , hopefully future analysis could be done with bigger sample size to find out more" ?
Edit: or would it be better to remove outliers and re run?
This outlier is an indication that as the cars get that old, the linear portion of the regression becomes nonlinear (starting around 10 years) and asymptotic to zero. Car prices cannot go negative. For classic cars the value may even increase as it gets older.