can Multivariate normality avoid overfitting scenarios?

#1
Hi,
Multivariate Normality is the third assumption in assumptions of linear regression. The linear regression analysis requires all variables to be multivariate normal. Means data should be normally distributed.
so,

can Multivariate normality avoid overfitting scenarios?
Because, if we have a normally distributed features, then estimated co-efficients will work perfectly on entire unseen populations of all independant features.
Thanks,
param
 

Dason

Ambassador to the humans
#2
The only assumption of normality is that the error term is normally distributed. You do not need the predictors to be normally distributed.
 

spunky

Can't make spagetti
#3
Multivariate Normality is the third assumption in assumptions of linear regression. The linear regression analysis requires all variables to be multivariate normal. Means data should be normally distributed.
Would you mind sharing with us where did you read this? Or where does it come from?
 

noetsi

No cake for spunky
#4
Overfitting commonly has to do with too many variables in the model. It has nothing to do with normality. Things like LASSO are used to address it.
 

Dason

Ambassador to the humans
#5
They can be used. Also just reducing the number of variables in the model helps. But honestly it just sounds like op thinks normality is required for certain things when it isn't.
 

noetsi

No cake for spunky
#6
Yeah but lasso is a scientific way to reduce variables :)

Multivariate normality is pretty much exclusively an issue with standard errors. And with a moderate size sample really does not matter much despite all the stress it gets in regression courses. Or that is the sense I get.
 

Dason

Ambassador to the humans
#7
I still don't think you're understanding that it sounds like OP is under the assumption that all variables need to be multivariate normal (including the predictors). It's just the error term that has a normal distribution assumption and that's only if you're doing tests.
 

noetsi

No cake for spunky
#8
Actually I meant to point out that multivariate normality, which means the normality of the residuals not of the predictors or dependent variable, is formally a requirement to estimate the standard errors (only it does not bias the slopes) but that if you have a moderate or larger sample even that really does not matter. The results will still be correct, the standard errors, even if its not multivariate normal.

But obviously I did not say that well. Normality is probably the most overstressed element in undergraduate classes and text. And that ignores that the p values which it is really used for are themselves increasingly seen as of lesser importance.