Difference between simulating the dependent variable and simulating the error terms and adding them to the fitted values values assuming normality?

KNR

New Member
What's the statistical difference between simulating the dependent variable and simulating the error terms and adding them to the fitted values values assuming normality (gaussian GLM)?

Say I'm doing a simple multiple regression on the following data (R):
Code:
n <- 40 x1 <- rnorm(n, mean=3, sd=1)
x2 <- rnorm(n, mean=4, sd=1.25)
y <- 2*x1 + 3*x2 + rnorm(n, mean=2, sd=1)
mydata <- data.frame(x1, x2, y)
mod <- lm(y ~ x1 + x2, data=mydata)
I don't get the statistical difference between:
• tmp <- predict(mod) + rnorm(length(predict(mod)), 0, summary(mod)$sigma)(R function simulate); • tmp <- rnorm(length(predict(mod)), mean(y), sd(y)); What is the proper way to resample the dependent variable assuming gaussian GLM? hlsmith Less is more. Stay pure. Stay poor. Almost, but not completely following. Errors represent epistemological uncertainty right. Without them it would be a deterministic model. You could simulate a truckload of data and sample without replacement. What is the purpose of this endeavor? KNR New Member The purpose is bootstrap dependent variable in generalized linear models Buckeye Active Member Just to be clear, you can also bootstrap an entire vector of independent variables and the dependent variable. I'm assuming you have data that you want to fit to a glm (not that you are trying to simulate the data yourself). There are probably more elegant ways to do this is R Code: library(dplyr) data("mtcars") # add row number mtcars<-mtcars %>% mutate(row_nbr=row_number()) # sample rows with replacement resampled_rows<-sample(x = mtcars$row_nbr,size = nrow(mtcars),replace = T)

# get bootstrapped data
bootstrapped_data<-mtcars[resampled_rows,]
If you plan to simulate the data, maybe this link will help: https://stats.stackexchange.com/questions/59062/multiple-linear-regression-simulation

Last edited: