In this paper, we are looking for a game-like optimization problem. We have to pick up a xt for an unknown objective ft(x), which might be a loss the adversary might ompose upon us. To evaluate the final result, the accumulated loss w.r.t time t = 1, ..., T is compared with a fixed action x, that minimize the total lost. The difference of the two is hence called regret (why not wait until the last ft is given),
There are several results with with problem:
- Zinkevich showed for linear function, the regret will increase as sqrt(T) using his proposed online gradient descent algorithm.
- Hazan et al showed under the assumption of strong convexity, the regret will increase as log(T).
This paper has several results:
- With a regularization technique, they propose an adaptive online gradient descent algorithm, which ensures 1-6 times regret when the coefficients for the regularizer are determined offline for the lowest regret.
- It might be proved that for linear functions and strongly convex functions, likewise rates, O(sqrt(T)) and O(log(T)), are to be achieved.
- There exists a similar algorithm for general Lp norm, with the help of Bregman divergence generalization, from L2 in
The Bregman divergence is
No comments:
Post a Comment