How do I select features for my linear regression model?

I am predicting international viewership.

I have 7 predictor variables I am considering putting into the model with correlation ranging -0.2 to .65 (I have heard it is okay to put features with low correlation to the response variable).

Now that I have these 7 predictor variables how do I know which ones to actually put. I am cross-validating, so would I just choose the combination that minimizes the error for when I predict on the test set?


Not a robit
First you use your content expertise to select and organize the potential predictors. Using cross-validation is an excellent idea, so then you tune the model based on your knowledge and estimates on the holdout set.


Fortran must die
There are different opinions on this topic. Are you trying to explain a relationship or predict a DV optimally?

I think you should build your model based on theory and then report those variables that are not significant. Others would argue you should chose the model based on the lowest AIC (regardless I think of what the initial theory says).