Forward algorithm

From Infogalactic: the planetary knowledge core
Jump to: navigation, search
Not to be confused with Forward-backward algorithm.

The forward algorithm, in the context of a hidden Markov model, is used to calculate a 'belief state': the probability of a state at a certain time, given the history of evidence. The process is also known as filtering. The forward algorithm is closely related to, but distinct from, the Viterbi algorithm.

For an HMM such as this one:

Temporal evolution of a hidden Markov model

this probability is written as P(x_t | y_{1:t} ). Here x(t) is the hidden state which is abbreviated as x_t and y_{1:t} are the observations 1 to t. A belief state can be calculated at each time step, but doing this does not, in a strict sense, produce the most likely state sequence, but rather the most likely state at each time step, given the previous history.


The goal of the forward algorithm is to compute the joint probability p(x_t,y_{1:t}), where for notational convenience we have abbreviated x(t) as x_t and (y(1), y(2), ..., y(t)) as y_{1:t}. Computing p(x_t,y_{1:t}) directly would require marginalizing over all possible state sequences \{x_{1:t-1}\}, the number of which grows exponentially with t. Instead, the forward algorithm takes advantage of the conditional independence rules of the hidden Markov model (HMM) to perform the calculation recursively.

To demonstrate the recursion, let

\alpha_t(x_t) = p(x_t,y_{1:t}) = \sum_{x_{t-1}}p(x_t,x_{t-1},y_{1:t}).

Using the chain rule to expand p(x_t,x_{t-1},y_{1:t}), we can then write

\alpha_t(x_t) = \sum_{x_{t-1}}p(y_t|x_t,x_{t-1},y_{1:t-1})p(x_t|x_{t-1},y_{1:t-1})p(x_{t-1},y_{1:t-1}).

Because y_t is conditionally independent of everything but x_t, and x_t is conditionally independent of everything but x_{t-1}, this simplifies to

\alpha_t(x_t) = p(y_t|x_t)\sum_{x_{t-1}}p(x_t|x_{t-1})\alpha_{t-1}(x_{t-1}).

Thus, since p(y_t|x_t) and p(x_t|x_{t-1}) are given by the model's emission distributions and transition probabilities, one can quickly calculate \alpha_t(x_t) from \alpha_{t-1}(x_{t-1}) and avoid incurring exponential computation time.

The forward algorithm is easily modified to account for observations from variants of the hidden Markov model as well, such as the Markov jump linear system.


In order to take into account future history (i.e., if one wanted to improve the estimate for past times), you can run the backward algorithm, which complements the forward algorithm. This is called smoothing.[why?] The forward/backward algorithm computes P(x_k | y_{1:t} ) for 1<k<t. So the full forward/backward algorithm takes into account all evidence.


In order to achieve the most likely sequence, the Viterbi algorithm is required. It computes the most likely state sequence given the history of observations, that is, the state sequence that maximizes P(x_{0:t}|y_{0:t}).

See also

Further reading

  • Russell and Norvig's Artificial Intelligence, a Modern Approach, starting on page 541 of the 2003 edition, provides a succinct exposition of this and related topics