# MLE of multinomial distribution with missing values

#### bit

##### New Member
Hi,

I am trying to solve a problem and the results I get seem counter-intuitive.

We randomly throw $$n$$ balls into an area partitioned into 3 bins $$b_1,b_2,b_3$$. The size of each bins is proportional to the probability the ball will fall in it. Let's call these probabilities $$p_1,p_2,p_3$$. This can be described by a multinomial distribution.

Now let's say I throw 12 balls, and I know how many landed in each bin ($$x_1=3,x_2=6,x_3=3$$).

I would like to estimate the size of the bin from the observations. For this I use Maximum Likelihood. It can be shown that the MLE will be $$p_1=3/12,p_2=6/12,p_3=3/12$$. This is pretty intuitive.

It turns out that the actual likelihood at this point is:

$$L(p_1=0.25,p_2=0.5,p_3=0.25|x_1=3,x_2=6,x_3=3)=$$

$$=\frac{12!}{3!6!3!}0.25^30.6^60.25^6=0.07050$$

Now, let's assume I knew in advance that $$p_1=p_3$$. How would that change my result? It would not - I would still get the same parameter values $$p_1=0.25,p_2=0.5,p_3=0.25$$.

The twist comes now: let's assume I cannot observe balls that landed in $$b_3$$. If I know that 12 balls were thrown I am fine, since I can calculate $$b_3=n-b_1-b_2=12-3-6=3$$. but what happens if I don't know $$n$$?

I figure that in this case, I would need to estimate $$x_3$$ (or equivalently $$n$$) as well. However, if I use MLE, the results start looking weird. Intuitively, I would expect that if I observe $$x_1=3,x_2=6$$ and I know that $$p_1=p_3$$, then the MLE will probably be $$p_1=0.25,p_2=0.5,p_3=0.25,x_3=3$$. However, it is clearly not the maximum, since for example:

$$L(p_1=0.24,p_2=0.52,p_3=0.24|x_1=3,x_2=6,x_3=2)=$$

$$=\frac{11!}{3!6!2!}0.24^30.52^60.24^2=0.07273$$

So from this it seems that $$x_1=3,x_2=6,x_3=2$$ is more likely than $$x_1=3,x_2=6,x_3=3$$ even if I know that $$p_1=p_3$$, which seems very counter-intuitive.

My questions are whether my logic is sound, whether my intuition is misleading me and whether this is the correct way to estimate the parameters and missing data.

Thanks!

#### BGM

##### TS Contributor

$$\text{Multinomial}(n; p_1, 1 - 2p_1, p_1)$$

with the observation $$(X_1, X_2)$$

and $$n, p_1$$ are unknown parameters.

Actually, this problem is similar to estimating the Binomial parameters with both $$n, p$$ are unknown. Theoretically this can always be done. But one main drawback is that such estimators are shown to be unstable, see e.g. Olkin Petkau And Zidek (1981)

http://www.stat.ncsu.edu/information/library/mimeo.archive/ISMS__1539.pdf

Anyway it is possible to do it, and the example you give should not be the maximum yet.

#### bit

##### New Member
Thanks for the reference. It is still weird for me that [3,6,2] would be more likely than [3,6,3] when we know that $$p_1=p_3$$. Is there an intuitive explanation for that?

#### Dason

Well in one case you have n=11 and in the other you have n=12. Also note that in general the likelihood is going to decrease with respect to n (since we're taking the product of n probabilities < 1)

So even if we keep the relative ratios in the bins the same the likelihood will decrease if we increase n. For example...
Code:
> # P(X1=3, X2=6, X3=3 | p1=.25, p2=.5, p3=.25)
> dmultinom(c(3,6,3), 12, c(.25, .5, .25))
[1] 0.07049561
> # P(X1=6, X2=12, X3=6 | p1=.25, p2=.5, p3=.25)
> dmultinom(c(6,12,6), 24, c(.25, .5, .25))
[1] 0.03636
It doesn't necessarily make sense to compare the likelihood values if you change the actual outcomes...

#### BGM

##### TS Contributor
It is not hard to show that under your model, for a fixed $$n \geq x_1 + x_2$$, the MLE

$$\hat{p}_1 = \frac {n - X_2} {2n}$$

Therefore for $$n = 12, x_2 = 6$$,

$$\hat{p}_1 = \frac {12 - 6} {2 \times 12} = \frac {1} {4}$$

and you obtain $$\left(\frac {1} {4}, \frac {1} {2}, \frac {1} {4} \right)$$ as your MLE.

Whereas for for $$n = 11, x_2 = 6$$,

$$\hat{p}_1 = \frac {11 - 6} {2 \times 11} = \frac {5} {22}$$

and you obtain $$\left(\frac {5} {22}, \frac {6} {11}, \frac {5} {22} \right)$$ as your MLE.

And you compare the likelihood function for these two cases and see which one is larger. You repeat this process and search for the $$n$$ that maximize the likelihood.

#### bit

##### New Member
And you compare the likelihood function for these two cases and see which one is larger. You repeat this process and search for the $$n$$ that maximize the likelihood.
Thanks - I understand how to find the MLE for $$x_3$$ (or N), I just think it is weird (intuitively) that if I observe $$x_1=3,x_2=6$$ and I know that $$p1=p3$$, it turns out the MLE for $$x_3$$ is not 3.