Thursday, October 7, 2010

Bundle Methods for Regularized Risk Minimization

by Choon Hui Teo, S.V.N. Vishwanathan, Alex Smola and Quoc V. Le

This paper introduces a bundle methods for RRM. In optimization, the standard optimization technique that derives bundle method is cutting plane method.

In cutting plane methods, the convex functions are bounded from below by a series of cutting planes, constructed via subgradients. If we have many of them, they may reduce the space to search for optima. And actually, we select the next point using w_t = \arg\min_w J_t^\text{CP} (w) where J_t^\text{CP}(w) = \max_{i} J(w_{i-1}) + \langle w - w_{i-1}, s_i \rangle.

The cutting plane method suffers from slow convergence. Bundle methods improve it with proximal functions (basically quadratic functions to approximate the objective function). There are several ways of constructing the proximal functions:
  • proximal: w_t = \arg\min_w \frac{\zeta_t}{2} \| w - \hat{w}_{t-1} \|^2 + J_t^\text{CP} (w)
  • trust region: w_t = \arg\min_w \{ J_t^\text{CP}(w) \mid \frac{1}{2} \|w - \hat{w}_{t-1} \|^2 \leq \kappa_t\}
  • level set: w_t = \arg\min_w \{ \frac{1}{2} \| w - \hat{w}_{t-1} \|^2 \mid J_t^\text{CP} (w) \leq \tau_t\}

This paper proposed another bundle methods, yielding a O(log(1/\epsilon)) convergence bound. It has several variant, one of which doesn't involve line search.

Unfortunately, yourequation.com is down due to heavy request. I have to switch to another equation rendering method ASAP.

PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications


This paper discussed two implementations of PLDA, one using MPI and another using Map/Reduce. It seems that the Map/Reduce framework in google doesn't support iteration too. The comparison of LDA implementations via variational Bayesian, EP and Gibbs sampler was mentioned. But somehow quite strange, I would like to make the comparison myself sometime later.


Some models can be naturally extended into its parallel version, such as many monte-carlo methods. In the authors' implementation I see many similarity as another version by my colleagues. Wen-yen now is also one of my colleagues now, small world!

Tuesday, October 5, 2010

Templates and Anchors: Neuromechanical Hypothesis of Legged Locomotion on Land


I am not a biologist and don't quite understand the point. Generally I think they are discussing using templates to simplify the modelling of complicated locomotion of organism. They proposed two templates, SLIP (spring-loaded inverted pendulum) and LLS (lateral leg spring), trying to justify their model in biology, maybe. I can hardly find any equations in their paper, so I guess the math required to model these templates should be comparatively simple. Maybe using the Lagrangian, we may obtain the motion equation easily?

Collaborative Filtering on a Budget


This paper talks about dealing with large scale collaborative filtering. The collaborative filtering can be formulated as a matrix factorization problem and we may try several loss functions with different regularizer. One typical example is the M3F paper previously scanned here. A convenient solver is stochastic gradient descent.

The core idea proposed is we may use two hash functions (one for user and another for items recommended) to aggregate the user matrix and item matrix to eliminate the computational cost in large-scale problems. These matrices are approximated with the help of Rademacher functions. But I have no idea why this is possible. Maybe I will take a look some day later.

Saturday, October 2, 2010

Spacetime Constraints

by Andrew Witkin and Michael Kass

This is a paper from graphics talking about how to implement an animation of objects, such as luxo lamp from Pixar (see details of Lamp Jr. on wikipedia).

First thing is about a spacetime particle. We get the ODE for the particle using Newton's law and typically we should solve the ODE using some numerical methods. We have some initial value constraints and also an objective function (minimum feul that exhausted in the procedure). This is actually a variational optimization problem. The numerical solution is obtained by discretization. The derivatives are approximated via finite differences. And then the objective function actually turns out to be a quadratic form. So we get a linear constrained quadratic optimization problem.

For a complex object, the ODE is obtained via Lagrange dynamics (hmm... almost forget it). So actually we may solve a similar problem (but much more complicated, you have to analyse case by case).