I have one dependent variable (Y) which is binary. I have this data for 100 subjects.

I want to generate binary data for 3 independent variables (X1, X2, X3) (so effectively I want to generate 3 columns of data with 100 values in each column)

I want to generate this data according to a Sensitivity and Specificity.

I also want to be able to specify how much variation in Y is explained by each X variable.

Any ideas?

What I did do, but isn’t correct, is:

Looking at only Y =1 values in the meantime and concentrating on Sensitivity:

Sensitivity = 0.6

Sensitivity for X1 = 17% * 0.6 = 0.102, Sensitivity for X2 = 33% * 0.6 = 0.198, Sensitivity for X3 = 55% * 0.3.

I then generated data from the Bernoulli distribution for X1 using p=0.102

I then generated data from the Bernoulli distribution for X2 using p=0.198

I then generated data from the Bernoulli distribution for X3 using p=0.300

I then created an X4 variable such that if (X1=1 or X2 = 1 or X3 = 1), X4=1, else X4=0.

If I then do count(X4=1)/100 I get Sensitivity of X4 = 0.5. Not my oringal sensitivity 0.6.

Increasing the number of subjects doesn’t make a difference.

Any ideas what I’m misunderstanding, or what I can do as a solution?

Thanks

Barry