How to train a community of stochastic generative models
Geoffrey Hinton
Gatsby Computational Neuroscience Unit
University College London
It is possible to combine multiple probabilistic models of the same
data by multiplying the probabilities together and then renormalizing.
This is a very efficient way to model data which simultaneously
satisfies many different constraints. Each individual model can focus
on giving high probability to data vectors that satisfy just one of
the constraints. Data vectors that satisfy this one constraint but
violate other constraints will be ruled out by their low probability
under the other models. For example, one model can generate images
that have the approximate overall shape of the digit 2 and other more
local models can ensure that local image patches contain segments of
stroke with the correct fine structure. Or one model of a word string
can ensure that the tenses agree and another can ensure that the
number agrees. Training a product of models appears difficult
because, in addition to maximizing the probabilities that the
individual models assign to the observed data, it is necessary to
minimize the normalization term by making them disagree on unobserved
data. Fortunately, there is an efficient way to train a product of
models. Some examples of product models trained in this way will be
described, and I will show that they extract interesting structure
from images.