# distribution of sum of dependent binomial variables

#### pepsico007

##### New Member
I have N bernoulli dependent variables, X1, X2, ..., XN, and Xi ~ B(1, pi), pi is already konwn. And Y=X1+X2+...+XN, I need to get the distribution of Y. If Xi, Xj is independent, then I can use the simulation, generate X1,...,XN using the bernoulli distribution, and get the Y; repeat 10000 times and then get 10000 samples of Y, so I can know the distribution of Y.

But now Xi, Xj is dependent, so I also need to take into account the correlation, assuming that corr(Xi, Xj) = 0.2, i!=j. How can I insert the correlation, or just get the distribution from the mathematics.

Last edited:

#### Dason

If $$X_2 \sim B(1, p_1)$$ and $$X_2 \sim B(1, p_2)$$ and you want $$Cor(X_1, X_2) = .2$$ you can set up a table

$$\begin{tabular}{|l|cr|} \hline & X1=0 & X1=1 \\ \hline X2=0 & a & b \\ X2=1 & c & d \\ \hline \end{tabular}$$

Then we want $$b+d = p_1$$ and $$c+d = p_2$$ to get us the binomial parameters. To get the correlation we want $$\frac{d - p_1p_2}{\sqrt{p_1(1-p_1)p_2(1-p_2)}} = 0.2$$

Once you solve for the values of a,b,c,d you'll have a 2x2 table and you can just sample those 4 spots with the corresponding probabilities and assign $$X_1$$ and $$X_2$$ the appropriate values based on which spot you sampled.

#### pepsico007

##### New Member
If $$X_2 \sim B(1, p_1)$$ and $$X_2 \sim B(1, p_2)$$ and you want $$Cor(X_1, X_2) = .2$$ you can set up a table

$$\begin{tabular}{|l|cr|} \hline & X1=0 & X1=1 \\ \hline X2=0 & a & b \\ X2=1 & c & d \\ \hline \end{tabular}$$

Then we want $$b+d = p_1$$ and $$c+d = p_2$$ to get us the binomial parameters. To get the correlation we want $$\frac{d - p_1p_2}{\sqrt{p_1(1-p_1)p_2(1-p_2)}} = 0.2$$

Once you solve for the values of a,b,c,d you'll have a 2x2 table and you can just sample those 4 spots with the corresponding probabilities and assign $$X_1$$ and $$X_2$$ the appropriate values based on which spot you sampled.
Thanks for your advice. It seems that I can solve the equations and get the value of a, b, c, d. But I can not clearly understand how to sample next and get the value of X.

#### Dason

Let U be a sample from a uniform distribution. Then if 0 <= U <= a let X1=0 and X2=0. If a < U <= a+b let X1=1 and X2=0. If a+b < U <= a+b+c let X1=0 and X2 = 1. Else let X1=1 and X2=1.

Basically you treat the four options as the outcomes of a multinomial experiment - sample from that multinomial experiement and then convert the outcome to your binary outcomes.

What language are you programming this in?

#### pepsico007

##### New Member
Let U be a sample from a uniform distribution. Then if 0 <= U <= a let X1=0 and X2=0. If a < U <= a+b let X1=1 and X2=0. If a+b < U <= a+b+c let X1=0 and X2 = 1. Else let X1=1 and X2=1.

Basically you treat the four options as the outcomes of a multinomial experiment - sample from that multinomial experiement and then convert the outcome to your binary outcomes.

What language are you programming this in?
I do it with R. I think I can understand this method now. But when the number of variable become 3 or more, the problem seems to be complex, I may need to think about clearly and extend to multi variables.

Thanks Dason.