Multiple Linear Regression (Actual Vs Predicted)

#1
Hi forum,
I am new to statistics and R in general so please bear with me if I am not clear enough. So I am generating a 3 level full factorial design with 4 variables (P, P1, P4, INJ), and three responses (qo, qw, qg) which I am using in order to run a multiple linear regression on in R. I have attached the excel file.

Code:
# Attach file containing vartiables and responses
a<-read.csv("C:/Users/B/Desktop/Three Level FF with Responses.xls")
attach(a)
# Run a linear regression
model<-lm(qo~P+P1+P4+INJ)
# Summary of linear regression results
summary(model)
My question is how do I go about plotting my predicted values against my actual values? I would like to do this in order to see how good of a fit my regression is.
 
#3
Not to be meddlesome, but I noticed that your data is not orthogonally coded, which would be required for the lm() function to provide independent estimates of the effects (unless this is just an example).

Also, try "model$fitted.values" if you're specifically interested in the predicted values.
 
#5
Not to be meddlesome, but I noticed that your data is not orthogonally coded, which would be required for the lm() function to provide independent estimates of the effects (unless this is just an example).

Also, try "model$fitted.values" if you're specifically interested in the predicted values.
Sorry dont understand what orthogonally coded mean. Running "model$fitted.values" generated a table of values which i presume are predicted values that R generates? Question is how does R generate these values? What calculation does R use to generate them?
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
R or any program uses a linear combination of your independent variables and their model generated coefficients.
 
#8
The model you're specifying in the lm() function is

\(qo = \beta_0 + \beta_1P + \beta_2P1 + \beta_2P4 + \beta_3INJ\)

What lm() does is estimate the \(\beta\)s so that you can predict your outcomes using the model equation. The fitted values provided by R are obtained by solving for qo in each row of your original dataset.

Usually model fits are evaluated by plotting the residuals (the predicted values - the actual values) vs. each of the independent variables, which is what plot(model) provides. In that case, a random scatter with no patterns, centered around y = 0, would indicate an adequate fit.

You can also plot the predicted values vs. each of the independent variables, though this would be more for the purpose of visualizing the association between your independent variable and the response, when all other variables are constant.