ANOVA vs. principal component analysis for nonlinear analysis

First, let me say that I am not a trained statistician by any means. Here's what I'm working on. I have a dataset with 8 factors (sample attached). Each factor is a measured variable that may or may not affect the response. I need to determine which factors are important and then find a curve fit. I have reason to believe that several of these factors affect the response in a nonlinear fashion. My current method has been doing a least squares curve fit in MATLAB and then basically comparing R-squared with different combinations of factors. The more I read, the more I'm pretty sure this is not the best way to do this (plus it's pretty messy). I've had people suggest doing ANOVA or PCA on this dataset, but I can't really figure out which is appropriate, and I'm having problems finding good resources that explain each in an approachable way as someone who has only ever taken a pretty basic stats class. Any suggestions? Good resources?

One quick and easy way to look an nonlinearities would be to generate a correlation plot matrix (x vs. y plots for all pairs of variables). In Matlab, just use corrplot(). If there are any obvious nonlinearities, look into variable transformation to linearize. This NIST Handbook is a good, easy-to-read resource.

If you have only one response variable and many more observations than variables, then you could use multiple linear regression to identify important factors. Start off with a full model, incorporating all variables (and interaction effects if you like). Look at the confidence intervals for the coefficients - if the interval includes zero (at whatever percentage limit you are comfortable with; 95% is typical), then you can conclude they are very likely insignificant and exclude them in the next iteration. R-squared can be made arbitrarily high just by adding more terms to your model, so it's not a good metric to look at on its own.

If your variables are highly correlated, you should look into PLS to get more robust results.


TS Contributor
Graph your data first. This will help you identify potential relationships as well as any nonlinear relationships.