Tuesday, February 3, 2009

Classification using Discriminative Restricted Boltzmann Machines


This paper formulates a ``discriminative'' RBM. The visible layer contains the features and the labels (in a softmax way). It can be trained in a generative way and a discriminative one. The generative one maximizes Pr( x, y ) while the latter maximizes Pr( y | x ). As we can see, Pr( x, y ) could be maximized with contrastive divergence as the normal RBM. And I remember Hinton's course last year includes an example of classification using free energy directly with this model. Now for the discriminative case, when the number of classes is small, a exhaustive sum over y is feasible. The good news is that the partial differentials can be obtained with explicit expressions.

Regarding last scanned paper, they propose a hybrid model that relies on two loss functions (negated log-likelihoods, introducing a coefficient for the generative part). Then the discriminative part has a non-stochastic gradient and the generative part has a stochastic one (due to CD).

As for semi-supervised learning, we just marginalize out the label y from the RBM. Then the gradient becomes the CD averaged over y. This negated likelihood of Pr(x) will function as a regularizer.

It's amazing the DRBM alone could achieve 1.81% error rate on MNIST with only 500 samples. The HDRBM could achieve even lower. They mention a sparse version which I will scan soon.

No comments: