EM algorithm is **an algorithm for deriving the maximum likelihood estimator (MLE)**, which is generally applied to statistical methods for incomplete data. Originally, the concept of **“incomplete data and complete data”** was established to handle missing data, but by extending the definition, it can be applied to cut data, censored data, mixed distribution models, Robust distribution models, and latent data. It can also be applied to variable models, and Bayesian modeling.

Also, a number of statistical approach for clustering and unsupervised learning (eg, k-means, Gaussian mixture models) can be **generalized** as EM algorithms when focusing on the computational process. In addition, researches on analyzing the EM algorithm from the viewpoint of **information geometry** has been active, and applying EM algorithm to the stochastic model including an exponential family can be summarized in the form of **e-projection / m-projection**.

## 1. Statistical inference

Purpose: To find out the probability distribution that a certain variable follows.

Namely, when considering a stochastic model determined by the parameter and detecting the optimal parameter from dataset , the follwing Approximation holds. This is called a statistical inference (or statistical estimation).

## 2. Maximum likelihood estimation

The most basic algorithm for statistical inference is maximum likelihood estimation (MLE). A log likelihood function of the stochastic model is defined as and an empirical objective function of : that depends on dataset can be obtained, MLE of parameter is derived as follows.

## 3. EM algorithm

Let’s define the following data categories.

- Complete data :

**not observable**but completely follows the true distribution

- Imcomplete data :

- observable but
**not completely follows**the true distribution

- observable but

In general, the relationship complete data and incomplete data is a one-to-many relationship. but here, as a convenient assumption, I introduce a latent variable to express this constraint, that is assume holds. Considering the stochastic model for the complete data ,

data sample cannot be observed and its likelihood cannot be calculated. However, for data sample can be observed and its likelihood can be calculated. By using this formula, the estimated value of can be obtained by approximating , the likelihood function of complete data . The procedure to derive the estimated value of is called EM algorithm because it is an iterative method that repeats E-step and M-step alternately.

EM algorithm

Initialize with .

- For each step :

E step

Update the expectation value .

M step

Derive the optimal parameter \theta^{(t+1)} that maximize value.

- Consider the convergence value as the algorithm output.

As a result, the estimated value of is derived as and the following holds.

Also, the summarized formula of calculations in E step and M step is as follows.

EM step