MLE of multinomial distribution with missing values


New Member

I am trying to solve a problem and the results I get seem counter-intuitive.

We randomly throw \(n\) balls into an area partitioned into 3 bins \(b_1,b_2,b_3\). The size of each bins is proportional to the probability the ball will fall in it. Let's call these probabilities \(p_1,p_2,p_3\). This can be described by a multinomial distribution.

Now let's say I throw 12 balls, and I know how many landed in each bin (\(x_1=3,x_2=6,x_3=3\)).

I would like to estimate the size of the bin from the observations. For this I use Maximum Likelihood. It can be shown that the MLE will be \(p_1=3/12,p_2=6/12,p_3=3/12\). This is pretty intuitive.

It turns out that the actual likelihood at this point is:



Now, let's assume I knew in advance that \(p_1=p_3\). How would that change my result? It would not - I would still get the same parameter values \(p_1=0.25,p_2=0.5,p_3=0.25\).

The twist comes now: let's assume I cannot observe balls that landed in \(b_3\). If I know that 12 balls were thrown I am fine, since I can calculate \(b_3=n-b_1-b_2=12-3-6=3\). but what happens if I don't know \(n\)?

I figure that in this case, I would need to estimate \(x_3\) (or equivalently \(n\)) as well. However, if I use MLE, the results start looking weird. Intuitively, I would expect that if I observe \(x_1=3,x_2=6\) and I know that \(p_1=p_3\), then the MLE will probably be \(p_1=0.25,p_2=0.5,p_3=0.25,x_3=3\). However, it is clearly not the maximum, since for example:



So from this it seems that \(x_1=3,x_2=6,x_3=2\) is more likely than \(x_1=3,x_2=6,x_3=3\) even if I know that \(p_1=p_3\), which seems very counter-intuitive.

My questions are whether my logic is sound, whether my intuition is misleading me and whether this is the correct way to estimate the parameters and missing data.



TS Contributor
Now your model become

\( \text{Multinomial}(n; p_1, 1 - 2p_1, p_1) \)

with the observation \( (X_1, X_2) \)

and \( n, p_1 \) are unknown parameters.

Actually, this problem is similar to estimating the Binomial parameters with both \( n, p \) are unknown. Theoretically this can always be done. But one main drawback is that such estimators are shown to be unstable, see e.g. Olkin Petkau And Zidek (1981)

Anyway it is possible to do it, and the example you give should not be the maximum yet.


New Member
Thanks for the reference. It is still weird for me that [3,6,2] would be more likely than [3,6,3] when we know that \(p_1=p_3\). Is there an intuitive explanation for that?


Ambassador to the humans
Well in one case you have n=11 and in the other you have n=12. Also note that in general the likelihood is going to decrease with respect to n (since we're taking the product of n probabilities < 1)

So even if we keep the relative ratios in the bins the same the likelihood will decrease if we increase n. For example...
> # P(X1=3, X2=6, X3=3 | p1=.25, p2=.5, p3=.25)
> dmultinom(c(3,6,3), 12, c(.25, .5, .25))
[1] 0.07049561
> # P(X1=6, X2=12, X3=6 | p1=.25, p2=.5, p3=.25)
> dmultinom(c(6,12,6), 24, c(.25, .5, .25))
[1] 0.03636
It doesn't necessarily make sense to compare the likelihood values if you change the actual outcomes...


TS Contributor
It is not hard to show that under your model, for a fixed \( n \geq x_1 + x_2 \), the MLE

\( \hat{p}_1 = \frac {n - X_2} {2n} \)

Therefore for \( n = 12, x_2 = 6 \),

\( \hat{p}_1 = \frac {12 - 6} {2 \times 12} = \frac {1} {4} \)

and you obtain \( \left(\frac {1} {4}, \frac {1} {2}, \frac {1} {4} \right) \) as your MLE.

Whereas for for \( n = 11, x_2 = 6 \),

\( \hat{p}_1 = \frac {11 - 6} {2 \times 11} = \frac {5} {22} \)

and you obtain \( \left(\frac {5} {22}, \frac {6} {11}, \frac {5} {22} \right) \) as your MLE.

And you compare the likelihood function for these two cases and see which one is larger. You repeat this process and search for the \( n \) that maximize the likelihood.


New Member
And you compare the likelihood function for these two cases and see which one is larger. You repeat this process and search for the \( n \) that maximize the likelihood.
Thanks - I understand how to find the MLE for \(x_3\) (or N), I just think it is weird (intuitively) that if I observe \(x_1=3,x_2=6\) and I know that \(p1=p3\), it turns out the MLE for \(x_3\) is not 3.