# Linear regression on bio-data.

#### Kettle

##### New Member
Hi all, I'm completely new to this forum so forgive me (and tell me) if I break
protocol in any way.
My setup is:

I have n animals.
I believe there is a linear relationship between x
(a covariate I can measure on each of the animals)
and the age of the animals.

Let's say I could measure both x and age with no measurement error at all.
I would still not expect the actual observations to lie on a perfect straight line
because 'life is messy', i.e. biological variation would still cause the individual
animal to deviate from the line.

Now, I am interested in predicting age. So my first instinct is to make a linear regression
of age on x, if for no other reason than that this will give me a line that minimizes deviaton
of age from the line.

However, I'm having problems interpreting my model. I mean, biological variation
can hardly count as a deviation in a single variable/in a single direction?
I feel like the 'noise term' should be some kind of two-dimensional thing.

I hope this question makes sense.
Cheers
Kettle.

#### Mean Joe

##### TS Contributor
Are you saying that it doesn't seem possible to predict age from a single explanatory variable? You can add more explanatory variables; linear regression can be done with 10 explanatory variables.

With only X and Age, you can graph your line in 2 dimensions. With 10 explanatory variables and Age, your line would be in 11 dimensions, which is difficult to visualize but there are lines in >3 dimensions.

#### Kettle

##### New Member
Nono. I'm saying that we're dealing with a situation where you would never expect
the actual observations to conform perfectly to any fit whatsoever.
The linear relationship is definitely there.

But I mean, in a typical linear regression I can think of two reasons to prefer regressing
one 'way (y on x)' over 'another way (x on y')' :

1) x is completely known, without 'error'
2) you are interested in predicting y.

But I'm uncomfortable with both of these reasons in my case.
I mean, isn't the assumption of error going only 'one way'
(e.g. vertically, i.e. on y) rather unnatural when the source of error
is something like biological variation which cannot be said to
be an error on x or y but simply a cause for deviation from the line.

I can tell this is not a very well-defined question, so I apologise in advance.

#### bugman

##### Super Moderator
It sounds like you want to plot x and y error bars on a scatter plot and fit a linear regression line through these data?

#### Mean Joe

##### TS Contributor
I believe there is a linear relationship between x (a covariate I can measure on each of the animals) and the age of the animals.
...
Now, I am interested in predicting age. So my first instinct is to make a linear regression of age on x.
How do you have a measure of the age of the animals--is it estimated? Why do you want to predict age?

I've got to say, I've always used AGE as an explanatory variable; never something that I wanted to predict from X. But I could imagine a scenario where you measure X and then want to infer the age of the animal--like you catch a huge tuna and want to guess at how old it must be.

#### BGM

##### TS Contributor
If you have a simple linear regression model,

then you are modelling the conditional mean of the response as a linear function
of the explanatory variable:

$$E[Y|X = x_i] = \beta_0 + \beta_1x_i + \epsilon_i$$

where $$\epsilon_i \sim N(0, \sigma^2)$$ are i.i.d.

Of course you may have your own model for $$X$$

And you may consider
$$Var[Y] = E[Var[Y|X]] + Var[E[Y|X]] = E[\sigma^2] + Var[\beta_0 + \beta_1X] = \sigma^2 + \beta_1^2Var[X]$$

which also incorporate the variance of the explanatory variable.

I know that log-linear model for contingency table usually does not treat
a particular variable as the response/explanatory. But it is related to the
somehow related to poisson GLM.

#### ohammer

##### Member
There are several methods for estimating linear relationships in bivariate data with "errors" both in x and y. This is sometimes called Model II regression. Examples are Major Axis (MA) and Reduced Major Axis (RMA, sometimes called Standard Major Axis).