Need your expert ideas: WHAT IS THE RIGHT WAY TO RUN AN OUTLIER ANALYSIS, given this scenario?

#1
Suppose the student has a simple dataset composed of twelve variables labelled from V01 to V12. The linear combination of variables V01 to V07 define a variate called Meaning in Life. The linear combination of variables V08 to V12 define a variate called Life Satisfaction.

The student intends to run a simple linear regression where the variate Life Satisfaction will be regressed on Meaning in Life.

WHAT IS THE RIGHT WAY TO RUN OUTLIER ANALYSIS, given this scenario?

Should the student run an outlier detection procedure on each variate? How shall she consolidate the results if consequently, there will be two outlier detection outcomes (noting that a participant in the sample might be considered an outlier in one variate but not in the other)?

Should she run an outlier detection by combining all twelve variables? But what is the logic behind this approach?
 

katxt

Active Member
#2
She could start by combining the Vs and doing the regression. Then look at the residual plots - a normal probability plot and a residual vs predicted plot. If all looks reasonable then that is it.
Often when variables are combined the result tends to become better behaved, so don't worry about outliers until you need to. (In my opinion.)