Cluster Analysis, Discriminant Analysis and Multinomial Logistic Regression - Help!

mlb

New Member
#1
Hi Everyone,

I am having some serious problems with my analyses and was hoping someone might be able to help me out!

I have a sample of 95 participants, who have been classified into three groups using a cluster analysis. I used hierarchical cluster analysis for this, and had 26 variables on which I clustered the participants. I am now wanting to run analyses to find out which of these variables are the best predictors for category membership.

First I looked at discriminant analysis. However, due to my sample size I am violating the assumptions of discriminant analysis, namely that my data is not normally distributed and I do not have 5 times the number of IVs for number of cases in each group (IVs=26; DVs: group1= 32; group2=36; group3=27). The output from the discriminant analysis tells me the fit is good. However, how much of a problem is the violation on number of IVs and cases???

As I had violated the assumptions for discriminant analysis, I looked to Multinomial Logistic Regression. Again here I am violating the assumptions of regression: I have incomplete information from the predictors (e.g. 64% of cells with zero frequencies), I have complete separation of the data and it looks like I also have underdispersion. I tried to correct for some of these by selecting the deviance dispersion parameter. I also ran a principle components analysis on the IVs to reduce the number of variables, and used these instead. However I still have complete separation of the data.

Can someone tell me if there is anything else I can do??? Can i use the discriminant analysis as it is? Is there another analysis I can run to assess what I am looking for??

Any suggestions grately appreciated!

Mlb :)
 

Karabiner

TS Contributor
#2
Re: Cluster Analysis, Discriminant Analysis and Multinomial Logistic Regression - Hel

Maybe some background information would help.
I have a sample of 95 participants, who have been classified into three groups using a cluster analysis. I used hierarchical cluster analysis for this, and had 26 variables on which I clustered the participants.
Why that much (too much)?
I am now wanting to run analyses to find out which of these variables are the best predictors for category membership.
Why, what for? And why do you intend to use models
with multiple predictors instead of analysing the bivarate
relationships? The clustering wasn't done using multiple
predictor models.

With kind regards

K.