Tuesday, February 17, 2009

Implicit Mixtures of Restricted Boltzmann Machines


This is quite simple extension. It employs a three-order RBM instead of a mixture of RBM (mainly due to the complexity of training RBM). Usually we use v for input, h for hidden neurons. Now add another z for clustering. And v and h are binomial while z is actually a multinomial r.v. (so that it corresponds to the clustering index). This model can be trained with CD. We know given z and h the sampling of v is simple. The only thing we need to consider is how to sample h and z given v. This is still easy. One way is to find z first (well it has a number of configurations linear with the number of clusters, instead of exponential). By marginalize h, we get the free energy for each z. Then a softmax function (or just a normalization of the probability) will tell us the distribution of z. Then we can sample h.

Therefore, the three-order RBM is trained with CD. They address a problem of the training. One RBM might be too strong and others won't be chosen forever. They use a technique from annealing. By dividing the free energy with a temperature T (first high later getting smaller), we might avoid this problem.

No comments: