Confirmatory factor analysis: How to deal with non-normal data


I have 12 manifest variables, none of which is normally distributed. As far as I know this makes my data unsuitable for a CFA with maximum likelihood estimation (?). I am not sure, however, which alternative estimation method should be used instead. Apart from practical advice I am also thankful if you could point me to any papers discussing this.

Thanks in advance!

(I am not sure that it helps much in this case, but just to give you some background: My data is from a mood measure. Based on theory as well as on a previous EFA that was not done on the same sample I expect four latent factors, with three of the manifest variables loading on each factor.)


Fortran must die
It will make the data unsuitable for ML (at least the ML that SEM uses). A key issue is why your data is non-normal. For example data can be non-normal because it is categorical in nature which is a special issue for SEM beyond non-normality (the SE get inflated and the Chi Square test of model fit does not work correctly as is the case for indicators used in SEM that rely on it). That is a seperate issue from say skew or kurtosis.

This might help
Thank you for your reply. The data was collected with interval scales - therefore, it is not categorical. In addition, the existence of estimation methods suitable for non-normally distributed data was mentioned in this document:

This document includes examples using maximum likelihood estimation (MLE), including Full Information Maximum Likelihood (FIML) for situations in which there are missing values in the raw data file. However, MLE assumes multivariate normality among the observed variables, and preliminary diagnostics of sample data show strong deviations from normality for several of the variables. Alternative estimators exist for cases of non-normal data but for the most part lie outside the limited scope of this document. This document will also describe a weighted least squares (WLS) approach suitable for situations in which the x variables are categorical.

This is why I was looking for alternative estimation methods. Unfortunately, that search has been unsuccessful until now...


Phineas Packard
You could use robust maximum likelihood (see Peter Bentler's home page for relevant references). CFA is pretty robust to non-normality anyway and I think Steve West (with co-authors curren and someone) provides some fairly liberal criteria for skew and kurtosis in relation to CFA.