Regression with different populations


No cake for spunky
I am doing an analysis on the impact of services. I could have done this by simply looking at those who got the service and those that did not (there are a series of services). But it did not make sense to me to test who got a service versus who did not. Instead it made more sense to test who got the service who was eligible for the service (a professional judgement of counselors) against those eligible for a service who did not get it (as a separate issue there are many statistical controls built into the model) . For example I tested the impact of getting a transportation service by looking at who was deemed to need it by counselors (only about forty percent of those who were eligible for that service got it). This is done in linear regression (income is the DV).

To me that approach makes sense, I wondered what others thought of it. But I have a second question. Many services are offered and they impact (I would guess customers at the same time). But I can't figure out how to include the different services into one model. The reason is that I am testing only those who are eligible for a service. So the population will vary with every service.

Any suggestions how I can address this issue. Or should I just ignore being eligible for a service and just run if getting a service matters or not, ignoring eligibility entirely?


Less is more. Stay pure. Stay poor.
Well if I was eligible, but didn't redeem it or initiate a service - I am obviously not like those that did. Perhaps I didn't actually need it or that I was so poor off that I never had the resources to follow-up getting it. So the groups may not be exchangeable. So if you knew additional reasons related to this decision, you can balance the groups via weights or multiple regression and get 'assumed' conditional exchangeability.

Have you explored the idea of regression discontinuity design? Comparing those directly right above and below the cut-off of eligibility of the service?


No cake for spunky
I am not sure how I would apply RDD to our populations, although I can try. I don't think there is a numeric cut off value for eligibility. It is a judgement call on the part of counselors whether you meet a set of federal requirements. I doubt it is documented in any way I can get access to (or use, they are probably long involved comments not a number why they were determined eligible for a service or not). My real interest is not about eligibility for a service, it is if getting or not getting a service you were eligible for made a difference?

You could be the same if you got the service you were eligible for or did not. Your specific counselor may not have followed through, you could get sick or change your mind and so on. The point is that something outside the customer could drive whether someone did or did not get services. My concern is I could show one service as having an impact, when in fact another service not in the model might drive it (customers get many services).

I am not sure how I would do this.

"So if you knew additional reasons related to this decision, you can balance the groups via weights or multiple regression and get 'assumed' conditional exchangeability." I don't know what 'assumed conditional exchangeability" is. :)