Can "infinite" values be used in regression?


I am analyzing a series of plots along a recovery gradient, using "time since disturbance" as a continuous variable. I want to see how certain environmental variables change with time since disturbance and was thinking of using regression - however I also have plots that were never disturbed, so their time since disturbance is basically infinite. Is there a way to include these "infinite" values in the same regression? If not, is there a better way to compare undisturbed plots to plots experiencing a range of different recovery times?

been looking at stats textbooks without much luck - any suggestions appreciated!
essentially your question needs a little cudgeling to be well posed in a statistics framework. You have to think about what all this means and then ultimately you come to the conclusion that somethign silly is afoot.

So you want one predictor variable is that correct? You want to predict multiple environmental variables from the single time since disturbance variable? Out side of statistical significance this ends up equivalent to regressing them all seperately unto time since disturbance. Think carefully on this. You may want to predict time since disturbance from multiple environmental variables. A different problem.

Now suppose that you want to predict, as you say, the ENV (environmental vector) by TSD (time since disturbance). You are concerned with how to treat "never disturbed". Regression is perfectly fine with "catagorical predictor variables". In this case you would code a 1 or 0 in that predictor variable, and then code 0 for the time since disturbance if it has never been disturbed. (the reason why the latter becomes clear next).

But note, theres a "thing" about catagorical variables in linear regression that arn't interacting with anything else. They are like shifting the intercept of your regression line when they are present. Essentially the coefficient associated with the existance of the variable codes for the amount to shift. Easily interpreted as two different regression lines: one when never been disturbed is true and one when it is not true. Same result.

But here is the interesting part. Your model is now
ENV = b_0 + b_1 * TSD + b_2*NBD + error
where NBD is never been disturbed 1 or 0. But TSD is always 0 when NBD is 1. And you realize that this all boils down to predicting the population mean for everything that has never been disturbed as a straight horizontal line: an average of everything fitting that case.

And that makes sense as there is nothing in what you said that would predict for a change in environmental variables when they have never been disturbed beyond the fact that they have never been disturbed and error.

So essentially this is all there is unless you are going for for a more elaborate statistical statement.

J. Rounds
Thank you for the explanation, I think I understand: it is possible to add a new categorical variable (NBD) into the regression but it would basically separate the plots that have never been disturbed into a second, horizontal regression line. This doesn't sound like this would be useful, since my intention is to compare the plots that have not been disturbed to the disturbed ones, to try and answer the question of how much recovery time is necessary for environmental variables in the disturbed plots to reach the same levels as in the never disturbed plots. Sounds like it would be a better idea to split the disturbed plots into categories (short, medium and long time since disturbance) and use an ANOVA to compare to the never been disturbed plots.

Thanks very much!
That is a method, but I am starting to think what you might be interested in posing is an ANCOVA.

The difference is you dont toss out the information from the continous variable. In ANOVA you are essentially averaging across your catagories and asking if the means are different. If the truth is something different than that then you have tossed out something valuable.