Too many genes

#1
Hello all

I have a data sheet, with a discrete dependent variable having 3 categories.
I have a set of independent variables, some continuous, and discrete, usually with more than 2 categories. The main variable of interest has 5 categories.

Now, one of the variables to look at, is representing the existence of genes. It is discrete, and I have a list of around 100 genes (100 categories). Each person has 1 gene only, apart from a few minors with 2, and the one gave me the data marked it "A+B", while the others are "A", "B", "C" and so on....bad, ha ?

How should I approach this problem ? Obviously if I run a multinomial regression, I won't be able to include such a variable. Is there a way, a method or model that can handle this ? How would you deal with this problem ?

Thanks !
 
#3
I am trying to see which factors affect the dependent variable (with 3 categories)

One of the variables I want to check, is discrete with ~100 categories...some of them appear 20 times out of a ~600 patients sample, and many appear only once...