Randomized Control Trial Design Question

Imagine I am designing a RCT that is trying to measure the effectiveness of a voluntary job skills training program for new immigrants to be delivered BEFORE they arrive in their new country. I want to know whether the training will result in them being employed at a higher rate in their new country compared to those who have not taken the program.

My challenge is selection bias. If you just measure employment outcomes for those who take the program and those that don't, you are contaminated by the motivation that led these participants to enrol in the program, while others did not. If I randomly send out invitations to some (encouragement), I will still have the selection bias problem.

Aside from making the course mandatory for the treatment group, how do you avoid selection bias?

Appreciate any thoughts!


Less is more. Stay pure. Stay poor.
What is the sample size going to be like? Also depending on their final destination, there may be a risk that they are competing for the same jobs - so there is intervention interference.
The sample size is about 1000. These are people immigrating to a variety of cities across the country so unlikely they are competing for the same jobs. I think the question is that given an invitation to participate in this kind of skills workshop will lead to the most motivated, career minded individuals enrolling, how can you ensure the study reflects the effect on the overall population I.e if the gov rolled this out to all new immigrants what would be the effect?


Less is more. Stay pure. Stay poor.
That is why you would want to control for known covariate imbalances with confounding capabilities in the model or use propensity scores. This is where you would model as a dependent variable - participation in the program yes/no (classification problem). You will need to have access to these baseline data and also general domain knowledge. Then you could use a double robust method like the g-formula (substitution estimator) with propensity scores also modeled.

So you randomize, but they can refuse? If so you may also look at local average treatment effect models sometimes also called CACE or CATE.