Wednesday, July 8, 2009

Information Theoretic Metric Learning


This paper talks about a metric learning algorithm. The basic tool is the so-called Bregman divergence. the setting is to find a Mahalanobis distance, therefore a matrix A for inner product (x^\top A x). The trick is we need our matrix A is as close as to a given A_0, which is achieved via the KL divergence of two Gaussians with A_0 and A as their covariance matrix respectively. We further have two sets, one including pairs of samples that are near (their distances are less than a threshold), the other including dis-similar ones (their distances are larger than another threshold). It can be derived the KL divergence is actually a form of Bregman divergence and equals \mathrm{LogDet}(A_0, A) = \mathrm{tr}(A A_0^{-1}) - \log \det A A_0^{-1} - n. The constraints are linear to the matrix A to be optimized. This distance is the only scale-invariant and the loss leads to uniform minimum variance unbiased estimator. As SVM, we may introduct slack variabls to tackle the infeasible cases. To solve this optimization, a former algorithm (ICML 2006 paper) is extended.

There are some connection to the previous work to since it can be proved the low-rank kernel matrix is simply induced by the Mahalanobis kernel we find in the proposed metric learning algorithm. And the paper also addresses the online metric learning problem. They prove the proposed algorithm will converge.

No comments: