We offer 5 different training programs (A,B,C,D and E), students often take multiple programs. For those who take multiple programs what is the correlation? For example, 75% of students who took E always took A and C. Or no student took D if they didn't take C, for example

Would like to do this in R. I guess it is not regression but maybe some form of probability.

What I am trying to determine is if we want to increase sales of course C what courses help drive that. (I understand there are other factors but this model is based solely on related courses)


Are you looking for correlation coefficients for binary variables ("A yes/no vs. E yes/no")?
In that case, there are some options (Phi coefficient, Cramer's V, contingency coefficient C,
Goodman and Kruskal's Lambda).

@Karabiner - I am guessing they are not interested in correlation coefficients, that is just what they thought they needed. This sounds like a Market Basket problem, given you don't care about the ordering of the classes. ideally, I would want some confidence intervals on these metrics, but I am not familiar enough with associate rules to know how to add them.