Tuesday, December 23, 2008

Multiplicative Updates for Nonnegative Quadratic Programming

by Fei Sha, Yuanqing Lin, Lawrence K. Saul and Daniel D. Lee

This paper addresses a method for quadratic programming with non-negative constraints. We know there are serveral basic techniques dealing with non-negative constraints: one is use the logarithm and the gradient method would becoeme a multiplicative update step, which is usually referred to as EG.

In this paper, the following optimization problem,
minimize F(x) = xTA x /2 + bT x subject to x >= 0
is explored. The proposed updating rule is simple: let A = A+ - A-, where A+ and A- are the positive elements and negative elements of A and let ai = (A+ x)i, ci = (A-x)i. The updating scale factor resembles a root of a quadratic algebraic equation:
xi <- xi (-bi + sqrt(bi2 + 4 ai ci)) / (2ai)
The rest is the proof.

Here I want to mention several easy parts about the algorithm: the gradient of the objective function is ai + bi - ci and when it's positive the update rule will decrease the objective function and when it's negative the update rule will increase the objective function; the fixed point analysis shows that the condition coincides with KKT condition.

There are several application of this optimization technique, e.g. in the dual problem of SVM. But it seems that specific algorithms, such as SMO will do better than this multiplicative update rule. So maybe we have to check the convergence rate and do some experiments for comparison.

1 comment:

Anonymous said...

An interesting idea, but it's not clear why we would to use these types of updates.

To get the SVM dual expressed in terms of non-negative constraints alone, they must assume the bias is 0. This is ok, but if you make this assumption it opens up a whole new ball-game of optimization methods, such as gradient projection methods. This includes generalized Cauchy point methods, nonmonotone spectral projected gradient methods, two-metric projection methods, and Nesterov's method for solving problems with simple constraints:
- http://portal.acm.org/citation.cfm?id=589081
- The chapter of Nocedal & Wright's "Numerical Optimization" on quadratic programming.
- The chapter of Bertsekas' "Nonlinear Programming" on optimization over convex sets.
- The chapter of Nesterov's "Introductory Lectures on Convex Optimization" on nonsmooth convex optimization.

I suspect that all four of the above methods would solve this particular SVM dual faster than the multiplicative updates. The author's experiments where they verify that they get the 'right' classification error on some tiny UCI data sets are certainly not convincing from an optimization perspective.