Thursday, July 23, 2009

Herding Dynamical Weights to Learn


This paper talks a novel way of learning MRF such as Hopfield networks (is that also a Boltzmann machine). For these models, the primal-dual structure of maximum entrpy learning and maximum likelihood estimator tells the possible learning algorithms, e.g. gradient-based, since though the optimization is constrained in the former form, it doesn't in the dual form.

The gradient-based algorithms require solving the following inference problem
w_{\alpha}^{(t+1)} = w_\alpha^{(t)} + \eta( \bar{g}_\alpha - \mathbb{E}[g_\alpha]_P)
where the expectation is computed in the given model. The common ways of computing the expectation include a Gibbs sampler (MCMC) or mean field approximation (usually not good). For certain models, such as RBM, we might think about Hinton's contrastive divergence, but still we need non-deterministic algorithms.

The herding algorithm the author proposed here is a deterministic algorithm. It takes the limit of the annealing version of the negated log-likelihood function and it results in a tipi function. He then formulated the herding algorithm as first maximizing (due to the limit) and resulting in some pseudo samples and then calculating the gradient based on these pseudo samples.

I am not familiar with the dynamic system and the intrinsic idea behind this stuff though.

No comments: