# Testing for normality on correlated data (covariance matrix known).

#### mity

##### New Member
Hi all,

I have an experimental data set of $$[(x_1,y_1) (x_2,y_2) ... (x_n,y_n)]$$. Where (now) $$x_i's$$ are the predictor variables (independent, no uncertainty) and the $$y_i's$$ are the response variables (dependent). I know the $$y_i's$$ to have a covariance matrix $$\Sigma$$. Or in other words, I have a vector of variances, $$\sigma^2$$, associated with each of the $$y_i$$ measurements and I believe that the $$y_i's$$ are correlated by a known correlation matrix, say, $$\rho$$, due to my measurement technique. My goal is fit a regression line through the data and make some statistical inferences from the line of best fit. The inferences require that the error terms, $$e_i$$, be distributed like $$N(0,\Sigma')$$ ($$\Sigma'$$ not necessarily equal $$\Sigma$$).

As far as I understand, I cannot use something like Pearson's chi-squared test because that assumes that the $$x_i's$$ are independent. Mine are not.

So, I came to the conclusion that I must be testing against a multivariate distribution rather than a univariate distribution. However, if we are speaking of a multivariate distribution, my sample size is only 1. So, the way I understand it, that would be like asking if a single experimental measurement came from a univariate normal distribution. Am I right about this? How should this correlated data be analyzed?

If my data pass a univariate normality test, can I carry on with the regression analysis using multivariate techniques?

Thanks!
mity

Last edited:

#### Dragan

##### Super Moderator
Hi all,

.... I would like to do regression analysis on the data.....Thanks!
I'm guessing, and I'm sure other readers are as well, but if you're attempting to do a regression analysis then the normality assumption is imposed on the error terms associated with the regression model - not the X's.

#### mity

##### New Member
Dragan,

Thanks for the clarification. I agree that it is the errors that need to be shown as normal.

Perhaps you could clarify another point for me then. From my understanding most of the tests for normality are invariant under linear transformation $$AX + b[\math]. See equation 2.1 from http://link.springer.com/article/10.1007/s00362-002-0119-6 So, if I am planning to only do a LINEAR regression. Does that mean that showing the normality of \(x_i's$$ and the errors is equivalent. Or am I missing something?

Also, after the regression analysis, the errors would still be correlated, right? So, my question still remains. How would I test the normality of a single set of correlated data?

Thanks,
mity\)

#### Dason

Can you clarify - are all of your xs here predictors in your model or are you calling one of them the response?

Note that typically what we call "x" is the predictor and there is no assumption of normality on the x variable.

#### mity

##### New Member
Okay.

STARTING OVER TO AVOID CONFUSION. Let me formulate the problem this way.

I have an experimental data set of $$[(x_1,y_1) (x_2,y_2) ... (x_n,y_n)]$$. Where (now) $$x_i's$$ are the predictor variables (independent, no uncertainty) and the $$y_i's$$ are the response variables (dependent). I know the $$y_i's$$ to have a covariance matrix $$\Sigma$$. Or in other words, I have a vector of variances, $$\sigma^2$$, associated with each of the $$y_i$$ measurements and I believe that the $$y_i's$$ are correlated by a known correlation matrix, say, $$\rho$$, due to my measurement technique. My goal is fit a regression line through the data and make some statistical inferences from the line of best fit. The inferences require that the error terms, $$e_i$$, be distributed like $$N(0,\Sigma')$$ ($$\Sigma'$$ not necessarily equal $$\Sigma$$).

Also updated in the original post.

Thanks,
mity

Last edited: