Problematic variables in regression model


New Member
I'm using data from from a publicly available Europe-wide survey of employees which was conducted over a series of waves from 1991-2010. The survey changed slightly each time.

Unfortunately there's a question about childcare that is really important, which changed considerably (initially it was "how many children under 15 live in your house", then it was "how often are you involved in childcare", which are two very different things). Basically I just want to turn these into a binary variable for whether the respondent has childcare responsibilities or not.

However I do it will be imperfect - some of those who gave responses greater than zero for the first question won't have actually had responsibilities (e.g. an older sibling), while for the latter it's difficult to know what to do with those who have childcare responsibilities once or twice a week.

So my question is: should I make the best of what I have, and include an imperfect variable if I think it's important, or is it best to leave it out of the model?