Week 5: Exact Inference¶
Assigned Reading¶
- Murphy: Chapter 20
- MacKay: Chapter 21.1 (worked example with numbers)
- MacKay: Chapter 16 (Message Passing, including soldier intuition)
- MacKay: Chapter 26 Exact Inference on Factor Graphs
Overview¶
- Variable elimination (VE)
- Complexity of VE
Inference in PGM¶
In this lecture, we will explore inference in probabilistic graphical models (PGM).
Let
where X_R is the set of random variables in our model that are neither part of the query nor the evidence. We will focus computing the conditional probability distribution
each of the distributions we need to compute can be computed by marginalizing over the other variables.
However, naively marginalizing over all unobserved variables requires a number of computations exponential in the number of random variables, N, in our model.
Variable elimination¶
Variable elimination is a simple and general exact inference algorithm in any probabilistic graphical model.
The running time will depend on the graph structure of the model, but, by using dynamic programming we can avoid enumerating all assignments.
Simple Example: Chain¶
Lets start with the example of a simple chain
where we want to compute P(D). We have
and
clearly, this is exponential in the number of variables (\mathcal O(k^n)). But, reordering the joint distribution
we can begin to simplify the summation
So, by using dynamic programming to do the computation inside out instead of outside in, we have done inference over the joint distribution represented by the chain without generating it explicitly. The cost of performing inference on the chain in this manner is \mathcal O(nk^2). In comparison, generating the full joint distribution and marginalizing over it has complexity \mathcal O(k^n)!
Tip
See slide 7-13 for a full derivation of this example, including the runtime \mathcal O(nk^2).
Simple Example: DGM¶
Lets take the DGM we saw in lecture 3

What is p(x_1 | \bar x_6)? We have
Note
The \bar x means that the variable is observed.
and
to compute p(x_1, \bar x_6), we use variable elimination
Summary so far¶
A worst case analysis says that computing the joint distribution is NP-hard. In practice, some subexpressions in the joint depend only on a subset of variables due to the structure of the Bayesian network. We can therefore use dynamic programming to reduce the number of computations, however, this depends on choosing a good variable elimination ordering.
Sum-product inference¶
We want an algorithm to compute P(Y) for directed and undirected models. This can be reduced to the following sum-product inference task
where \Psi is a set of potentials or factors.
For directed models, \Psi is given by the conditional probability distributions for all variables
where the sum is over the set Z = X - X_F.
For undirected models, \Psi is given by the set of potentials. Because the sum product returns unnormalized \Phi distributions, we must normalize by \sum_Y\tau(y).
Example: Directed Graph¶
Take the following directed graph as example

The joint distribution is given by
with factors
Let's do variable elimination with ordering \prec \{C, D, I, H, G, S, L\}
