Why are so many variables in my regression significant?

#1
Dear everone,

For my research I am trying to define the impact of certain elements on a company's goodwill impairment.
For my regression analysis I deflated the following variables by the lag total assets:

- gdwlia (dependent variable)
- ROA
- BM
- difference in turnover between year t-1 and year t
- difference in cashflow between year t-1 and year t

Furthermore, I added a variable SIZE, which is the natural logarithm of the total assets in year T. However, when I run my regression, all of these variables turn significant:

1591288466883.png

Does anyone know why these variables are all significant and what step I may have missed?
Thank you in advance!
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
Having no idea of your context, I can point out that the estimate sizes are very small and perhaps insignificant in a business sense. A reason could be that you have a very large sample size so the model is picking up trivial effects.
 
#3
Having no idea of your context, I can point out that the estimate sizes are very small and perhaps insignificant in a business sense. A reason could be that you have a very large sample size so the model is picking up trivial effects.
Thank you very much for your reply

My sample size consists of 7000 observations.
Furthermore, is there any way that I can provide more context about my research?
Thank you very much for your help!
 
#5
I am sorry,
I am not that experienced with this yet, what kind of information would you like to receive?
My data consists of 7000 observations over a time period of 12 years.
Variables Bath & Smooth are calculated as the difference in net income in t-1 compared to t1 deflated by lag total assets. Bath is represented by all values lower then the median of the non zero negative values, while Smooth is represented by all values higher then the median of non zero positive values.
 

noetsi

Fortran must die
#6
I don't know how this impacts significance exactly but spurious regression is very common in time series, when you are tracking relationships over time. If two variables have a trend built into them they can show high correlation even though they are not related. One thing to look for, is if you have very high r square values.

I would start by just eying your variables over time. Does it look like there are trends in them (you can do a test for non-stationarity as well, but they have their limits).