When to give up on a regression model

Hey guys, new member so bare with me here =)

Currently I am undertaking a project for my studies. It requires I take some machine condition data as well as the age of the component and try to make a prediction on the remaining life based on these things. (Sorry I can't give exact descriptions due to the nature of the project).

The condition is based off a ranking from 1 to 5, 5 meaning urgent attention is required but not a guaranteed failure and 1 being normal operating conditions but random failure is still possible, the age is in hours.

I have created a regression model with some known failure data, The continuous predictors are the condition score and the current age, the independent variable is time to failure.

Now! I have completed the regression in minitab and have been tweaking and trying to improve the model for over a week, the best I can get is r-sq scores around 15% as well as S score about 3000 hours (expected life is around 24000 hours). Basically at this stage I am ready to give up and move onto something such as Weibull or Cox regression.

Attached is a 3d plot I did in matlab, the surface is the regression equation and the points are the actual values (residuals).. as you can see the model isn't going to be very accurate.. should I give up on regression? Any help is very much appreciated =) Thanks guys.

3d residuals.png


King of all Drama
I take some machine condition data as well as the age of the component and try to make a prediction on the remaining life based on these things.
Not exactly my area but I'm pretty sure this kind of problems are usually better addressed by something like a negative binomial regression or some other type of generalIZED linear model. The regular OLS regression doesn't usually cut it very well.
If it is time to failure it seems more natural to go to Weibull and Cox regression. Look at the Kaplan Meier estimator also.

But please note that the R^2 is not a quality index of your work. It is not very important by the way. Some parts of nature are noisy. That how it is. Some are not so noisy and more predictable.
Yeah, also not my area, but I think you need to move away from OLS and move into survival analysis. I don't think it would make sense to have a linear relation between current age and remaining expected time. I'm trying and failing to imagine what kind of hazard function would result in a linear relation between age and remaining expected time. Actually, human beings might make a nice case. Up until about 60, the "failure rate" is negligible, which means there's a straightforward linear relation between age and life expectancy: each year, you lose a year of life expectancy. Then, the failure rate goes up dramatically and the model breaks down for the tail cases. In fact, we will even get negative life expectancy with simple linear regression.

But if your machines break down more or less at a constant rate instead of all at the end, simple linear regression will fail more spectacularly.

So yeah, I think you need to look at survival analysis methods.
Thanks Junes, the only reason I haven't used survival analysis sooner is because I need to create a model which considers the current operating conditions of the machine as well as the historical failure data. It seems most survival functions/methods I have come across only consider the historical failures & censored data.
I need to look further into Cox regression but my current understanding is that the covariates are binary, either a condition is met or it is not. I will look further into this. Thanks again =)