Monday, February 15, 2010

Projected Gradient Methods for Nonnegative Matrix Factorization


This paper proposed an optimization technique for NMF problems, which I think could also be extended to more general Bregman divergence. The idea is based on optimization. It is pointed out that most algorithms can converge to local stationary points not local minima. The multiplicative updating rule only ensures the convergence, whether it be stationary point (let alone local minima) or not. The proposed algorithm is quite simple, merely eliminating those coords out of the feasible domain (i.e. set those negative values to 0). The result looks better than the multiplicative updating rule.

Friday, February 12, 2010

Generalized Nonnegative Matrix Approximation with Bregmen Divergence


This paper introduces a family of objective in the form of Bregman divergence. From this page, you may see how broad the family is. For any given continuously differentiable and strictly convex function f(\cdot), the Bregman divergence is defined as
B_f(p \parallel q) = f(p) - f(q) - \langle \nabla f(q), p - q\rangle
which can be regarded as the difference of function value change and a linear approximation of the change. There are many interesting properties. Please check it out. This paper uses the Bregman divergence to measure the goodness of NMF, which naturally extends the original NMF objectives in the previous paper scanned here. Please note that both B_\phi(x_i, Wv_i) and B_\phi(Wv_i, x_i) can be employed as the objective function, but the former one requires less computational work.

The most surprising fact is that the auxiliary function used to formulate the multiplicative updating rule can also be extended into the more general Bregman divergence case if the auxiliary function satisfies the same constraints
G(x,x) = f(x), \qquad G(x, y) \geq f(x), \forall y.
The authors provides the following f(\cdot),
f(v) = B_\phi(Wv, x)
given the approximation problem
X = (x_1 ,\ldots, x_N) \approx W V = W(v_1, \ldots, v_N).
The function G(\cdot, \cdot) is a little well-designed to simulate the original auxiliary function,
G(v, u) = \sum_{i, j} \lambda_{i, j} f\left( \frac{W_{i, j} v_j}{\lambda_{i, j}} \right) - \left( \sum_i f(x_i) + f'(x_i)(W v_i - x_i)\right)
They also derived another multiplicative rule with KKT condition.

This result is really interesting usage of Bregman distance.

Algorithms for Nonnegative Matrix Factorization


This paper follows their Nature paper. They proposed two objectives for the decomposition. Their algorithm also relies on the "auxiliary function". The updating rule is kind of interesting due to their multiplicative property. This is later generalized to Bregman divergence in a NIPS05 paper.