At the beginning, we want to see how each factor can affect the solubility. Actually the increase/decrease of solubility if the factor increase (therefore difference of solubility between high/low level). So we use DoE and it can tell us the significance of each factor within the same drug. Most of the application of DOE stops here as they only concern one response (like productivity) related with cost or time, but now we have a series of drugs, and we want to get information not only within the same drug, but also between drugs. For example, pH is a significant factor for drug A, but not drug B and we want to dig out the reason why, like different propertyies of the drugs, like pKa. So that is where we start to have problems.

The drug properties (pKa, log P, molecular weight ......) are sort of independent variates, yet some of them may have partially overlapping information. Currently we use DOE analysis for individual drugs and GLM to pool all the information together. For GLM we use solubility as response, 7 factors and adding the drug properties as covariate to scale up a model. But we still find this is not the best way to do the statistics, as larger valus of certain drug may perdominate the analysis and we lost information of outliers.

So I just want to ask does anybody come up with other possible analysis method for the multivariates correlations??:yup:

Any thoughts or suggestions or questions are welcome