regression analysis for massive data sets

vinux

Dark Knight
#2
Massive data sometimes may not add value to the regression analysis. when you have more than 1 million records, you get many variable significant in the model ( depending on the context) with p value <.001. The rule Pvalue 0.05 may not make sense here.
The reason is the standard error estimate of most of regression coefficient will become very small(this leads small pvalue).
My suggestion is performing regression on a sample reasonable size ( size varies with situation).