Asymptotics of MLE
Published:
This post summarizes asymptotic properties of MLEs.
Suppose we draw independent $X_1,\ldots,X_n\sim p(X;\theta)$, and obtain $\hat{\theta}_{MLE}$.
Consistency
We can treat MLE as empirical risk minimization where the empirical risk is defined as $$\hat{R}n(\hat{\theta},\theta)=\frac{1}{n}\sum\limits{i=1}^n\log\frac{p(X_i;\theta)}{p(X_i;\hat{\theta})}.$$ Thus, $\hat{\theta}{MLE}=\mathop{\arg\min}{\hat{\theta}}\hat{R}n(\hat{\theta},\theta)$. Note that the true risk is given by $$R(\hat{\theta},\theta)=\mathbb{E}{p(X;\theta)}[\hat{R}_n(\hat{\theta},\theta)]=KL(p(X;\theta)\|p(X;\hat{\theta})),$$ which is positive if $p(X;\theta)\neq p(X;\hat{\theta})$ almost surely.