# Regression and imputing zeros

#### domwebber

##### New Member
Hi all!

I have what is likely a simple regression problem.

I am imputing Variable X from survey Y to survey Z. I am using a simple linear regression model and the predict function in STATA.

So, I run a regression on Variable X in survey Y, using a host of independent dummy variables. Then I use the predict command to impute variable X onto survey Z using the regression parameters on the same dummy variables in Z. So far, so good.

However, the original variable in survey Y has 60% of observations equal to zero. But, when I impute none of the observations are zero. This is important as I need a similar proportion of my imputed variable equal to zero.

Does anyone have any idea how I can constrain my imputation to contain a similar proportion of zeros without arbitrarily adding zeros here or there? I've got a feeling that hotdecking might be my answer but would appreciate some further advice.

I'm off on holiday for a few weeks so looking forward to seeing some responses when I get back!

Dom

#### noetsi

##### Loves R
I don't know STATA so I don't understand what you are doing with the predict command. You would be better off to explain what you are doing in terms of the regression generically as compared to what you are doing with a software command most are probably not familiar with. What does the Stata predict command do substantively?

Do you mean you are predicting X in two different samples and comparing the results? Or that you are generating parameters in one sample and testing if they work in another different sample? This is not clear to me...

I don't really understand what you are doing when you say this (the only time I have seen the term imputation used in regression is to deal with missing values and I don't think you are doing this).

However, the original variable in survey Y has 60% of observations equal to zero. But, when I impute none of the observations are zero. This is important as I need a similar proportion of my imputed variable equal to zero.
Why are you imputing, whatever that means, this way? I have not seen this done unless you are, as I mentioned above, trying to test parameter validity from another sample and this does not appear to be what you are doing. Why would you expect the observations in one sample to be similar to that in another one. When you do multiple imputation commonly they are not the same for example.

Does anyone have any idea how I can constrain my imputation to contain a similar proportion of zeros without arbitrarily adding zeros here or there?
Again I am not sure what you mean by imputation, please explain what this is and why you are doing it. Why do you want one sample to have the same proportion of values as another (why do you think this makes sense for two random sample)?

In general you need to explain far more what you are trying to do, and why and realize few use STATA here so they won't be familiar with its functions.