# Proper Term for Related Variables in Regression

#### Peter Olesiuk

##### New Member
I am looking for the proper term for a regression/correlation that is artificially inflated due to the two variables being related by a common factor. I'm a biologist trying to determine if birth intervals increase with age. I initially fitted a regression between the birth interval and the age at the end of the birth interval, but I think that approach is biased and artificially introduces a positive relationship. Age at the end of the birth interval is equal to the age at the beginning of the interval plus the birth interval. The birth interval is thus a component of both the dependent and independent variables. Even if there is no relationship between age and birth interval, and birth intervals are random, I would get a positive relationship because animals tend to be older at the end of long birth intervals. If I regress the birth interval on age at the beginning of the birth interval, I can eliminate this bias. I recall there being a term for such a biased regression, but can't remember what it is.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
If both variables have a common cause or partial common cause it would be a confounded relation also sometimes referred to as a spurious association. There are many other scenarios, but this might be the one you are looking for.

Welcome to the forum!

#### Peter Olesiuk

##### New Member
hlsmith, thank you for the prompt response. "Spurious" may be the proper term, but in my case the relationship isn't due to correlation with a third unidentified variable, but more of a mathematical artefact. I initially found (and published) a small but significant negative correlation between birth interval and age at the end of the birth interval, indicating reproductive rates slow with age. But I'm now thinking the correlation was an artefact and have gone back and found no correlation between birth interval and age at the beginning of the interval, indicating reproductive rates don't slow with age.

If X=Age at Beginning of Birth Interval and Y=Birth Interval, my first approach was equivalent to regressing Y on X+Y which introduces an artificial relation without involving any 3rd variable. My second approach is equivalent to regressing Y on X which appear to be independent.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
What do you mean by birth interval? Please provide a small data sample, so I can follow along (data can be made up).

#### Peter Olesiuk

##### New Member
The birth intervals are calving intervals for killer whales. Killer whales typically give birth to their first calf at 10-20 years of age and subsequently give birth to calves at 2-10 year intervals over about a 25 year reproductive lifespan. I was interested in whether calving rates slow (intervals increase) with age of the mother, indicating reproductive senescence (e.g. reduced ovulation rates, increased abortion rates, etc.). I initially regressed calving intervals on the age of the mother when she completed the interval and found a positive correlation, suggesting reproductive senescence. However, I now recognize that approach to be biased/spurious - even if calving intervals are unrelated to age, a positive correlation occurs because mothers tend to be younger following short calving intervals and older following long calving intervals.

I've attached simulated data and screen shot of the regressions to illustrate the problem. I generated 100 random ages between 10 and 35 to simulate age of the mother at the beginning of the calving interval. I then generated 100 random calving intervals between 2 and 10 years. Finally, I calculated the age of mothers at the end of the calving interval by adding the intervals to her age at the beginning of the interval. There is a significant relationship between calving interval and age of the mother at the end of the interval, but no relationship between calving interval and age of the mother at the beginning of the calving interval.

As noted above, the "spurious" correlation has nothing to do with biology, but occurs because there is a mathematical relationship between calving interval and age of the mother at the end of the interval. If X=Age at Beginning of Calving Interval and Y=Calving Interval, my first approach was equivalent to regressing Y on X+Y which introduces an artificial relation without involving any 3rd variable.

I believe the problem of spurious relationships due to regressing mathematically related variable arises in other situations. Let me give a non-biological example. Suppose one were interested in assessing whether the time it takes to learn how to drive a car, defined as the time between when one first gets behind the wheel to the time they get their drivers license, increases with age. One might inadvertently assess this issue by regressing the time it took to learn how to drive on the age at which one receives their drivers license. But this would result in a spurious relationship - even if learning time was completely random, those that take longer to learn will tend to be older when they get their license. Mathematically, if X=Age First Get Behind the Wheel and Y=Time it Takes to Learn How to Drive, it follows that Age One Receives License will be X+Y and thus spuriously correlated with Y.

As an undergrad, I took some excellent statistics courses and recall there being a specific term for spurious relationships attributable to regressing mathematically related variables. But I'm now retired, so it's been awhile, and I can't recall what the term is called.

#### Attachments

• 337.1 KB Views: 0
• 928 bytes Views: 0