Did I deisgn this wrong? Using dummy variables for Y

dbuck

New Member
#1
Hi all,

Thanks for taking a look at this.

Regression Statistics
  • Multiple R 0.908734206
  • R Square 0.825797856
  • Adjusted R Square 0.824730441
  • Standard Error 0.111438927
  • Observations 822

I'm looking at a group of employees that either resigned or remained active with our company. My Y is just based off dummy variables (0 for active, and 1 for resigned). Is this an incorrect way to set up my regression?

To get an idea of my X variables:

Coefficients
  • Intercept = 0.438122425
  • Rate to Mkt = 0.786883997
  • Last Merit = -0.048438063
  • Age = -0.002198365
  • Grade Norm = -0.001754669
  • Service = -0.004530844

It just seems that there shouldn't be such a stark difference between my results for active and resigned employees. It seems a little too clear, right?

Thanks again,

David
 

Karabiner

TS Contributor
#2
[*]Multiple R 0.908734206
I'd guess that just 2 or three trailing digits would improve legibility.
Is this an incorrect way to set up my regression?
I am not sure whether 0/1 data may be used as dependent variable
in a linear regression. Why didn't you consider logistic regression?
And how many subjects did resign?
[*]Rate to Mkt = 0.786883997
Seems as if this variable predicts the DV very well.
What does this variable mean?


With kind regards

K.
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
Linear models can do a good job predicting categorical data, but as noted a logistic regression model would be more appropriate given the two dependent groups.

Also you may want to examine for collinearity between your independent variables (e.g., using VIF or Tolerance).
 

jpkelley

TS Contributor
#4
Why didn't you consider logistic regression?
I agree. Logistic regression will be more appropriate. Given the way you've coded your 0/1 data, the dependent scale will be "probability of resigning," with 0 meaning that there is a 0 probability that an employee resigns (i.e. he/she stays active).

As hlsmith says, checking for collinearity is good, but be cautious about eliminating everything that shows a high VIF. There are mixed views about what predictors to eliminate. If predictors are strongly positively related, it's probably a good idea to eliminate one of them. If they are negatively related, it is probably smart to keep both regardless of VIF. A third variable might be in play.