# Normality assumption for PCA?

#### seb3343

##### New Member
I know that the classical Pearson correlation coefficient is only valid when
data are normally distributed. For this, I generally use the Shapiro–Wilk
normality test.

I was recently wondering if the data also need to have a normal distribution to use a PCA. I didn't find a clear answer to this in the litterature but I read
that PCA assumes a multivariate normality of the data. I was wondering (1) if
you agree with this, (2) what this actually means, and (3) if there is a test
to check this.

Thank you very much!

#### lumhearts

##### New Member
That is not a very strict requirement. If you have multivariate normality, then great, but if you don't, results can still be interpreted. PCA is not a p-value driven technique.

Checking that assumption is difficult. I would just check normality for each variable separately along with skewness and kurtosis stats.

#### ohammer

##### New Member
I know that the classical Pearson correlation coefficient is only valid when data are normally distributed.
That is not quite true. You are free to compute Pearson's correlation for data with any distribution. Maybe not always a smart thing to do, but there is no law against it. It's when you start to make p values that things become more strict.

Similar thing with PCA. You are free to PCA any data you wish, but it may work better for multivariate normal data.

#### bugman

##### Super Moderator
Like ohammer said, but just an additional note:

If you are using PCA for modelling purposes (either subsequent gradient analyses or regression) - then normality would be ideal. If its for data reduction or exploratory prurposes, then normality (as previous posters have mentioned) is not a strcit requirement.

#### seb3343

##### New Member

I asked the same question to several statisticians in parallel and I got quite different answers. In the end, I guess all depends what I want to do with the data (as bugman says if its for data reduction or exploratory purposes, then normality is not a strcit requirement)

Here are the other answers that I got:

(1) PCA is a purely geometrical technique - there is no need for a statistical hypothesis

(2) Multivariate normality is an assumption of PCA, but not a critical assumption. You can test for multivariate normality with a version of Shapiro-Wilk for multivariate normality.

(3) For PCA, there are assumptions about the data - that is is continuous and normally distributed - but this can be overlooked if the purpose of the test is to generate further hypotheses

Thanks!

Sebastien

#### ohammer

##### New Member
I asked the same question to several statisticians in parallel and I got quite different answers.
Haha, I guess they are correct on average!

#### lauccy

##### New Member
I know that the classical Pearson correlation coefficient is only valid when
data are normally distributed. For this, I generally use the Shapiro–Wilk
normality test.

I was recently wondering if the data also need to have a normal distribution to use a PCA. I didn't find a clear answer to this in the litterature but I read
that PCA assumes a multivariate normality of the data. I was wondering (1) if
you agree with this, (2) what this actually means, and (3) if there is a test
to check this.

Thank you very much!