8.3 Hierarchical generative models and predictive coding

The Bayesian formulation tells us what the brain is computing. It does not tell us how the brain computes it. The brain is not literally evaluating $P(S|M)P(M)/P(S)$ — that would require an enormous integration over possible hypotheses. The brain must be implementing the inference approximately, using neural circuits.

The most influential current proposal for how is predictive coding, formalized by Rao & Ballard in 1999 and developed extensively by Friston and others. The idea: the brain is organized as a hierarchy of layers, with each layer maintaining a generative model of what it expects the layer below to produce. The model produces predictions, which are subtracted from the actual input, leaving a prediction error. The prediction error is what propagates upward; the predictions propagate downward.

If predictions match input, prediction errors are small, and nothing much happens. If predictions fail to match, the error signal updates the higher-level representation to better predict, and the loop continues until prediction errors are minimized at every level.

The mathematical core of this is hierarchical Bayesian inference, with the precision of each layer’s prediction error (its inverse variance) weighting how strongly that error affects updates. In a generative model with multiple layers $\theta_1, \theta_2, \theta_3, \ldots$ , each layer makes predictions about the layer below, and the prediction errors are weighted by their precisions:

\Delta\theta_i \propto \pi_i \cdot \bigl(\theta_{i-1} - g(\theta_i)\bigr) - \pi_{i+1} \cdot \bigl(\partial g/\partial\theta_i\bigr) \cdot (\theta_i - g(\theta_{i+1})),

where $\pi_i$ is the precision of layer $i$ ‘s prediction error and $g$ is the generative function. The first term updates $\theta_i$ based on errors below; the second based on errors above. The whole brain — under this framework — is one big precision-weighted prediction-error minimizer.