distribution of sum of dependent binomial variables

#1
I have N bernoulli dependent variables, X1, X2, ..., XN, and Xi ~ B(1, pi), pi is already konwn. And Y=X1+X2+...+XN, I need to get the distribution of Y. If Xi, Xj is independent, then I can use the simulation, generate X1,...,XN using the bernoulli distribution, and get the Y; repeat 10000 times and then get 10000 samples of Y, so I can know the distribution of Y.

But now Xi, Xj is dependent, so I also need to take into account the correlation, assuming that corr(Xi, Xj) = 0.2, i!=j. How can I insert the correlation, or just get the distribution from the mathematics.
 
Last edited:

Dason

Ambassador to the humans
#2
If \(X_2 \sim B(1, p_1)\) and \(X_2 \sim B(1, p_2)\) and you want \(Cor(X_1, X_2) = .2\) you can set up a table

\(
\begin{tabular}{|l|cr|}
\hline
& X1=0 & X1=1 \\
\hline
X2=0 & a & b \\
X2=1 & c & d \\
\hline
\end{tabular}
\)


Then we want \(b+d = p_1\) and \( c+d = p_2\) to get us the binomial parameters. To get the correlation we want \(\frac{d - p_1p_2}{\sqrt{p_1(1-p_1)p_2(1-p_2)}} = 0.2\)

Once you solve for the values of a,b,c,d you'll have a 2x2 table and you can just sample those 4 spots with the corresponding probabilities and assign \(X_1\) and \(X_2\) the appropriate values based on which spot you sampled.
 
#3
If \(X_2 \sim B(1, p_1)\) and \(X_2 \sim B(1, p_2)\) and you want \(Cor(X_1, X_2) = .2\) you can set up a table

\(
\begin{tabular}{|l|cr|}
\hline
& X1=0 & X1=1 \\
\hline
X2=0 & a & b \\
X2=1 & c & d \\
\hline
\end{tabular}
\)


Then we want \(b+d = p_1\) and \( c+d = p_2\) to get us the binomial parameters. To get the correlation we want \(\frac{d - p_1p_2}{\sqrt{p_1(1-p_1)p_2(1-p_2)}} = 0.2\)

Once you solve for the values of a,b,c,d you'll have a 2x2 table and you can just sample those 4 spots with the corresponding probabilities and assign \(X_1\) and \(X_2\) the appropriate values based on which spot you sampled.
Thanks for your advice. It seems that I can solve the equations and get the value of a, b, c, d. But I can not clearly understand how to sample next and get the value of X.
 

Dason

Ambassador to the humans
#4
Let U be a sample from a uniform distribution. Then if 0 <= U <= a let X1=0 and X2=0. If a < U <= a+b let X1=1 and X2=0. If a+b < U <= a+b+c let X1=0 and X2 = 1. Else let X1=1 and X2=1.

Basically you treat the four options as the outcomes of a multinomial experiment - sample from that multinomial experiement and then convert the outcome to your binary outcomes.

What language are you programming this in?
 
#5
Let U be a sample from a uniform distribution. Then if 0 <= U <= a let X1=0 and X2=0. If a < U <= a+b let X1=1 and X2=0. If a+b < U <= a+b+c let X1=0 and X2 = 1. Else let X1=1 and X2=1.

Basically you treat the four options as the outcomes of a multinomial experiment - sample from that multinomial experiement and then convert the outcome to your binary outcomes.

What language are you programming this in?
I do it with R. I think I can understand this method now. But when the number of variable become 3 or more, the problem seems to be complex, I may need to think about clearly and extend to multi variables.

Thanks Dason.