generate overdispersed poisson data

#1
Hi there,

Can anyone tell me how to produce overdispersed poisson data? An algorithm or something? I find it difficult.

Thanx in advance for any help!
 

TheEcologist

Global Moderator
#2
Hi there,

Can anyone tell me how to produce overdispersed poisson data? An algorithm or something? I find it difficult.

Thanx in advance for any help!
Generate it from a Negative binomial.

http://en.wikipedia.org/wiki/Negative_binomial_distribution

The negative binomial distribution with size = n and prob = p has density

p(x) = Gamma(x+n)/(Gamma(n) x!) p^n (1-p)^x

also often used with 'the dispersion parameter', where prob (p) = size/(size+mu). Where mu is the equivalent of lambda (in a Poisson dist) or 'the expected value'.

If you keep the n parameter low it should be far more 'dispersed' than a Poisson. With large n p(x) becomes a Poisson.
 
#3
Thanks very much for the answer!

I am afraid I don't completely get it though.

My first goal was something simple. To produce some values from a poisson model (no overdispersion) and then estimate the parameters to see if I get it right so I did:

Step 1: Set b0=5 , b1=-0.5
Step 2: Procuce 1000 random number for Exponential with mean 3. (X)
Step 3: For i=1:1000 produce y(i)=Poisson(exp(b0+b1*x(i))). (So we have 1000 y's)
Step 4: With the data X,Y apply poisson regression and see that estimates are good (near 5 and -0.5).

Everything is fine above.
Now if I wanted to produce overdispersed data should I:

Step 1: Set b0=5 , b1=-0.5, dispersion parameter let φ=2.
Step 2: Procuce 1000 random number for Exponential with mean 3. (X)
Step 3: For i=1:1000 produce y(i)=?? (So we have 1000 y's)

y(i) should be produced from the negative binomial with what parameters in order to get let's say dispersed data with dispersion parameter 2? I want to create it and then see if I can estimate.


Thanks again for any help.
 
#4
Oh you wanted a specific dispersion parameter? How very scientific of you.

I just read the original post and thought "the easiest way to produce over-dispersed data is to double up on the random process". Intuitively having a random process followed by another random process on the result (but not modeling that) inserts more variation then you should expect. It is worth knowing because while I have limited experience with the topic I suspect its a common source of over-dispersion in real life.

I wrote the code to illustrate:
Code:
nsims = 5000
X1  = numeric(nsims)
X2  = numeric(nsims)
X3 = numeric(nsims)
for (i in  1:nsims){
    x = 5*runif(100) + 1
    y1 = sapply(x,function(xi) rpois(1,xi))
    y2 = sapply(y1,function(xi) rpois(1,xi))
    y3 = sapply(y2,function(xi) rpois(1,xi))
    fit1 = glm(y1 ~ x, family=poisson)
    X1[i]  = fit1$dev
    fit2 = glm(y2 ~ x, family=poisson)
    X2[i] = fit2$dev
    fit3 = glm(y3 ~ x, family=poisson)
    X3[i] = fit3$dev
}

mean(X1) #109 for me
mean(X2) #208 for me
mean(X3) #298 for me
As you can see having the model be properly poisson you on average get a dispersion estimate about 1. But if you have it be hierarchical one time with a poisson on the poisson but model it as a straight poisson you get a dispersion estimate about 2. And if you go another layer doing a poisson on the result of that second poisson but modeling it as a straight poisson you get a dispersion estimate about 3.
 
#5
Though I suppose the question might be how can I generate data that is properly modeled with an over-dispersed poisson. That I didn't answer! I just answered the question about how to generate data that appears over-dispersed, but is not neccesarily modeled correctly with an over-dispersed model.
 

TheEcologist

Global Moderator
#6
Thanks very much for the answer!

I am afraid I don't completely get it though.

My first goal was something simple. To produce some values from a poisson model (no overdispersion) and then estimate the parameters to see if I get it right so I did:

Step 1: Set b0=5 , b1=-0.5
Step 2: Procuce 1000 random number for Exponential with mean 3. (X)
Step 3: For i=1:1000 produce y(i)=Poisson(exp(b0+b1*x(i))). (So we have 1000 y's)
Step 4: With the data X,Y apply poisson regression and see that estimates are good (near 5 and -0.5).

Everything is fine above.
Now if I wanted to produce overdispersed data should I:

Step 1: Set b0=5 , b1=-0.5, dispersion parameter let φ=2.
Step 2: Procuce 1000 random number for Exponential with mean 3. (X)
Step 3: For i=1:1000 produce y(i)=?? (So we have 1000 y's)

y(i) should be produced from the negative binomial with what parameters in order to get let's say dispersed data with dispersion parameter 2? I want to create it and then see if I can estimate.


Thanks again for any help.
Well that ofcourse depends on how you calculate your dispersion parameter, of which I have no idea.