Sunday, January 11, 2009

Manifold Learning: The Price of Normalization


Although many manifold learning algorithms have been proposed early from 2000, it is still seldom discussed how we can evaluate these algorithms. We may have a vague idea when we design a simple algorithm, but no further checking of these results are carried out. This paper focuses on the tough part.

The main idea of the paper is that several algorithms that exploits normalization (a feeble reason provided is to avoid collapse of projection) would not work well simply because of normalization. These algorithms include LLE, Laplacian Eigenmap (LEM), LTSA, HLLE and DFM. As is known, MVU will not suffer from this problem, though it is still a spectral algorithm (it seems that those solving largest eigenvalues are immune to this problem, they don't need normalization). An intuitive understanding of the problem is that in order to make the embedding normalized, the embedding might still be folded so as to make on several directions the data expand equally.

To analyze the properties of these algorithm, we must have a uniform framework to formulate these seeemingly unrelated algorithms. The author proposed a weight matrix Wi for each sample xi. The projection later could be formulated as the minimization of the sum of || Wi yi ||. Some of the weight matrices are simple vectors (e.g. LLE), while others are a little complicated. The optmization could be denoted as minimizing Phi( Y ) subject to normalization. This function is the key to understand this paper.

The embedding quality must be quantified. They mention the embedding result usually does not preserver neighborhood. However this preservation should be desired. Therefore it must be modified correspondingly: an affine transform A is allowed to be imposed on Y. The manifold learning algorithm fails if there exists another embedding Z such that it satisfies the normalization constraints and Phi( AY ) > Phi( Z ). This definition is quantifiable and computable.

With this tool, they analyze all five algorithms and propose several necessary conditions so as to make these algorithms effective, for both limited samples and asymptotic cases. They come to a conclusion that using these algorithms on real data might be problematic. It seems that this way of finding a good projection is not what practitioners want.

No comments: