Paper Scanner: Parallelized Stochastic Gradient Descent

Monday, June 6, 2011

Parallelized Stochastic Gradient Descent

by Martin A. Zinkevich, Markus Weimer, Alex Smola and Lihong Li

This paper talks about a very interesting optimization technique. In online learning, SGD (stochastic gradient descent) is usually applied to optimize the model, since the data come in one by one (or one mini batch by another). For parametric models, the parameters are updated with one or a few samples sequentially.

This paper talks about a parallelized version. Essentially we run several SGD on different machines and aggregate their result by averaging. However, we may even do not distribute all data across all machines. The proof of convergence looks dependent on the convexity of the objective functions but I suspect it may not.

The result is quite interesting when we consider about the popular parallel computation framework, map/reduce. We'd better implement one ASAP.

Paper Scanner

Monday, June 6, 2011

Parallelized Stochastic Gradient Descent

No comments:

Recent Comments

Scanning Areas

Paper list

Labels

Scanner