Proof of asymptotic normality of MLE

If someone has seen these theorems I would appreciate some help in understanding a part of a certain proof.

The usual way to show that \( \sqrt{n} \left( \hat{\theta}-\theta_0 \right) \xrightarrow{D} N \left( 0, \frac{1}{I \left( \theta_0 \right)} \right) \) is to expand \( l \prime \left( \theta \right) \) , the log likelihood, into a Taylor series of order 2 about \( \theta_0 \) and evaluate it at \( \hat{\theta_n} \).

Doing so and rearranging we

\( \sqrt{n} \left( \hat{\theta}-\theta_0 \right) = \frac{n^{-1/2} l \prime \left( \theta_0 \right)}{-n^{-1} l \prime \prime \left( \theta_0 \right) - \left(2n \right)^{-1} \left( \hat{\theta_n} -\theta_0 \right) l \prime \prime \prime \left(\theta^{*} _n \right)} \)

where \( \theta* _n \) is between \( \theta_0 \) and \( \theta_n \), i.e. we use the lagrangian form of the remainder.

The usual asymptotic laws are at work in the numerator and the denominator and everything is fine for me for the most part. What I need your help on, is understanding how we bound in probability the second term in the denominator. I will present the way my book does it and you can advise me on how to better digest the method.

We assume the pdf is three times differentiable as a function of \( \theta \) and require that it is uniformly bounded in a neighborhood of \( \theta_0 \) by a function that is independent of \( \theta \) , i.e.

\( \left| \frac{\partial^3 log f \left( x; \theta \right)}{\partial \theta^3} \right| \leq M\left( x \right) \)

with \( E_{\theta_0} \left[ M \left( X \right) \right] < \infty \) for all \( \theta_0 -c < \theta< \theta_0 +c \) and all \( x \) in the support of \( X \).

Now under these assumptions, \( | \theta^{*}_n - \theta_0 | < c_0 \) and therefore \( \left| - \frac{1}{n} l \prime \prime \prime \left( \theta^{*} _n \right) \right| \leq \frac{1}{n} \sum_{i=1}^{n} M \left( X_i \right) \).

But \( \sum_{i=1}^{n} M \left( X_i \right) \xrightarrow{P} E_{\theta_0} \left[ M \left( X \right) \right] \) by the WLLN. For the bound then they select \( 1+E_{\theta_0} \left[ M \left( X \right) \right] \). Then let \( \epsilon>0 \) be given and choose \( N_1, N_2 \) so that

\( n\geq N_2 \Rightarrow P \left[ \left| \hat{\theta_n} -\theta_0 \right| < c_0 \right] \geq 1-\frac{\epsilon}{2} \)

\( n \geq N_1 \Rightarrow P \left[ \left| \frac{1}{n} \sum _{i=1}^{n} M \left( X_i \right) -E_{\theta_0} \left[ M \left( X \right) \right] \right| <1 \right] \geq 1-\frac{\epsilon}{2} \)


\( n\geq max \{ N_1 ,N_2 \} \Rightarrow P \left[ \left| -\frac{1}{n} l \prime \prime \prime \left( \theta^{*}_n \right) \right| \leq 1+ E_{\theta_0} \left[ M \left( X \right) \right] \right] \geq 1-\frac{\epsilon}{2} \)

hence \( n^{-1} l \prime \prime \prime \left( \theta^{*}_n \right) \) is bounded in probability. That concludes the theorem.

My question is, how exactly, is the last probability derived, in relationship with the information we have? It seems a few steps are skipped at the most crucial point. I understand that since both \( \hat{\theta_n} \) and \( \frac{1}{n} \sum_{i=1}^n M \left( X_i \right) \) converge in probability, they are bounded in probability but from there I cannot derive the last result. Any insight is greatly appreciated. Thank you.


TS Contributor
Not very sure which part gives you trouble exactly.

Anyway I try to derive the "last result" for you:

- The existence of \( N_2 \) is given by the consistency of MLE which should be proved in the earlier part of the book

- The existence of \( N_1 \), as said, is given by WLLN

- Now you want to show that \( |\hat{\theta}_n - \theta_0| |-n^{-1}l'''(\theta^*_n)| \) is bounded in probability (ignore the constant one half)

With the uniform bound, the second convergence result can be expressed like this:

\( \Pr\left\{\left|-\frac {1} {n} l'''(\theta^*_n)\right| \leq 1 + E_{\theta_0}[M(X)] \right\} \)

\( \geq \Pr\left\{\frac {1} {n} \sum_{i=1}^n M(X_i) \leq 1 + E_{\theta_0}[M(X)] \right\}\)

\( = \Pr\left\{\frac {1} {n} \sum_{i=1}^n M(X_i) - E_{\theta_0}[M(X)] \leq 1 \right\}\)

\( \geq \Pr\left\{\left|\frac {1} {n} \sum_{i=1}^n M(X_i) - E_{\theta_0}[M(X)] \right|\leq 1 \right\}\)

\( \geq 1 - \frac {\epsilon} {2} \)

Here we just use the fact if \( U \leq V \) almost surely, then

\( V \leq a \Rightarrow U \leq a \) and therefore \( \Pr\{V \leq a\} \leq \Pr\{U \leq a\} \)

Now you can combine with the first convergence result:

Again note that

\( |U| \leq a \text{ and } |V| \leq b \Rightarrow |U||V| \leq ab \)


\( \Pr\{|U||V| \leq ab \} \)

\( \geq \Pr\{|U| \leq a \text{ and } |V| \leq b \} \)

\( = \Pr\{|U| \leq a\} + \Pr\{|V| \leq b \} - \Pr\{|U| \leq a \text{ or } |V| \leq b \}\)

If \( \Pr\{|U| \leq a\} \geq 1 - \frac {\epsilon} {2} \) and \( \Pr\{|V| \leq a\} \geq 1 - \frac {\epsilon} {2} \)
then we can bound the above as

\( \geq 1 - \frac {\epsilon} {2} + 1 - \frac {\epsilon} {2} - \Pr\{|U| \leq a \text{ or } |V| \leq b \}\)

\( = 1 - \epsilon + 1 - \Pr\{|U| \leq a \text{ or } |V| \leq b \}\)

\( \geq 1 - \epsilon \)

which shows that the product \( |U||V| \) is bounded in probability by \( ab \) if each of the individual is bounded in probability. And it should be the desired convergence result.