Combining Regression and PCA scores

#1
I work in market research, and we're often asked to run regression to determine the impact of various components (like product attributes) on overall satisfaction/likelihood to buy, etc. Since many attributes usually correlate highly, we first run a PCA to extract latent variables, and use those variables as IVs in the regression.

One particular client however wanted to look at the impact of the specific attributes, and not the factors. However, the attributes were correlating too highly to put them in the regression model. So I had the idea of running the PCA and using the factors for the model as per usual, but this time I multiplied each attribute's correlation to the factors to their corresponding standardized betas, and added them up (I turned all negative scores either in the regression or the factors to positives prior to doing this). I provided the attributes rank-ordered in their impact on a chart, from highest to lowest.

The results that came out made sense, and I have tried it in other instances, with results that made sense... but I wasn't entirely positive if what I've done was legit. Is anything I've done as described above sacrilegous?

Thanks
 
#2
There is no such thing as "attributes ... correlating too highly to put them in the regression model". It's not like your regression is going to crash or lie to you if some of the factors you measure are hightly correlated. What it will do is give you very large error bars on the cofficients of the highly correlated factors. But that's just telling you the truth: it really can't tell which of the highly correlated factors is "really" contributing to the result, because the data really don't distinguish.

Doing the PCA first doesn't eliminate this problem. It may well be that the PCA constructs principal components isolating each set of highly correlated factors, and that the regression coefficients for these components have small error bars. But suppose you take a component that you have thus identified as a strong contributor to the result, and notice that it is strong in a couple of your factors. Your cusomer will say to you: "Great, so which of these factors is the influential one" and the only truthful answer is: "I don't know; the data don't distinguish".

Presumably your customer has asked you to regress on the factors precisely because they, and not some abstract principal components, are what he really has control over and he wants to know which to change. Your ad hoc methodology is producing a ranked list. That ranked list will, by construction, behave as you expect with respect to the individual correlations. But there is no deeper justification for your methodology, and it isn't revealing some truth that that simple multi-linear regression couldn't see. In fact, it's hiding a truth that the simple multi-linear regression was telling you: the data really don't distinguish which of the highly correlated factors is the most influential for the result.