# ANOVA vs. principal component analysis for nonlinear analysis

#### huelskbc

##### New Member
First, let me say that I am not a trained statistician by any means. Here's what I'm working on. I have a dataset with 8 factors (sample attached). Each factor is a measured variable that may or may not affect the response. I need to determine which factors are important and then find a curve fit. I have reason to believe that several of these factors affect the response in a nonlinear fashion. My current method has been doing a least squares curve fit in MATLAB and then basically comparing R-squared with different combinations of factors. The more I read, the more I'm pretty sure this is not the best way to do this (plus it's pretty messy). I've had people suggest doing ANOVA or PCA on this dataset, but I can't really figure out which is appropriate, and I'm having problems finding good resources that explain each in an approachable way as someone who has only ever taken a pretty basic stats class. Any suggestions? Good resources?

Thanks!

#### Aorus

##### New Member
One quick and easy way to look an nonlinearities would be to generate a correlation plot matrix (x vs. y plots for all pairs of variables). In Matlab, just use corrplot(). If there are any obvious nonlinearities, look into variable transformation to linearize. This NIST Handbook is a good, easy-to-read resource.

If you have only one response variable and many more observations than variables, then you could use multiple linear regression to identify important factors. Start off with a full model, incorporating all variables (and interaction effects if you like). Look at the confidence intervals for the coefficients - if the interval includes zero (at whatever percentage limit you are comfortable with; 95% is typical), then you can conclude they are very likely insignificant and exclude them in the next iteration. R-squared can be made arbitrarily high just by adding more terms to your model, so it's not a good metric to look at on its own.

If your variables are highly correlated, you should look into PLS to get more robust results.

#### Miner

##### TS Contributor
Graph your data first. This will help you identify potential relationships as well as any nonlinear relationships.