# Did I deisgn this wrong? Using dummy variables for Y

#### dbuck

##### New Member
Hi all,

Thanks for taking a look at this.

Regression Statistics
• Multiple R 0.908734206
• R Square 0.825797856
• Standard Error 0.111438927
• Observations 822

I'm looking at a group of employees that either resigned or remained active with our company. My Y is just based off dummy variables (0 for active, and 1 for resigned). Is this an incorrect way to set up my regression?

To get an idea of my X variables:

Coefficients
• Intercept = 0.438122425
• Rate to Mkt = 0.786883997
• Last Merit = -0.048438063
• Age = -0.002198365
• Service = -0.004530844

It just seems that there shouldn't be such a stark difference between my results for active and resigned employees. It seems a little too clear, right?

Thanks again,

David

#### Karabiner

##### TS Contributor
[*]Multiple R 0.908734206
I'd guess that just 2 or three trailing digits would improve legibility.
Is this an incorrect way to set up my regression?
I am not sure whether 0/1 data may be used as dependent variable
in a linear regression. Why didn't you consider logistic regression?
And how many subjects did resign?
[*]Rate to Mkt = 0.786883997
Seems as if this variable predicts the DV very well.
What does this variable mean?

With kind regards

K.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Linear models can do a good job predicting categorical data, but as noted a logistic regression model would be more appropriate given the two dependent groups.

Also you may want to examine for collinearity between your independent variables (e.g., using VIF or Tolerance).

#### jpkelley

##### TS Contributor
Why didn't you consider logistic regression?
I agree. Logistic regression will be more appropriate. Given the way you've coded your 0/1 data, the dependent scale will be "probability of resigning," with 0 meaning that there is a 0 probability that an employee resigns (i.e. he/she stays active).

As hlsmith says, checking for collinearity is good, but be cautious about eliminating everything that shows a high VIF. There are mixed views about what predictors to eliminate. If predictors are strongly positively related, it's probably a good idea to eliminate one of them. If they are negatively related, it is probably smart to keep both regardless of VIF. A third variable might be in play.