Week 6: Variational Inference¶
Choosing an Elimination Ordering¶
To choose an elimination ordering, we use a set of heuristics:
- Min-fill: the cost of a vertex is the number of edges that need to be added to the graph due to its elimination.
- Weighted-Min-Fill: the cost of a vertex is the sum of weights of the edges that need to be added to the graph due to its elimination. Weight of an edge is the product of weights of its constituent vertices.
- Min-neighbors: the cost of a vertex is the number of neighbors it has in the current graph.
- Min-weight: the cost of a vertex is the product of weights (domain cardinality) of its neighbors.
None of these criteria are better than the others. You often just have to try several.
Belief Propagation¶
What if we want p(x_i) \ \forall x_i \in X? We could run variable elimination for each variable x_i, but this is computationally expensive. Can we do something more efficient?
Consider a tree:

We can compute the sum product belief propagation in order to compute all marginals with just two passes. Belief propagation is based on message-passing of "messages" between neighboring vertices of the graph.
The message sent from variable j to i \in N(j) is

where each message m_{j \rightarrow i}(x_i) is a vector with one value for each state of x_i. In order to compute m_j \rightarrow i, we must have already computed m_k \rightarrow j(x_j) for k \in N(j) \not = i. Therefore, we need a specific ordering of the messages.
Example¶
Suppose we want to compute p(x_1) on the graph given above. Choosing an ordering is equivalent to choosing a root. Lets choose x_1 to be the root. Then
and finally,
so
where
Elimination algorithm in trees is equivalent to message passing.
Belief Propagation Algorithm¶
- Choose root r arbitrarily
- Pass messages from leafs to r
- Pass messages from r to leafs
- Compute
The runtime is 2 times the cost of VE (one for each pass), i.e. \mathcal O(nk^2) for n nodes and k states per node.