Monday, July 26, 2010

Heavy-Tailed Symmetric Stochastic Neigbor Embedding


t-SNE is a very useful visualization algorithm, which inherits the idea from SNE but modifies the neighborhood probabilities to t-distributions instead of the original. This heavy-tailed distribution works pretty well on many data. This paper discusses a more general case.

After transforming the original problem into its Lagrange dual, the authors get a unified algorithm (fixed-point iteration) for a family of heavy tailed distribution (including t-distribution). This is somewhat not as interesting as I have expected.

The authors also discussed how to integrate supervised information into the SNE algorithm. But their idea is kind of rough: inserting a similarity computed from the supervised information into the similarity in the high-dimensional space. I don't know whether this would truly work...

Dirichlet Component Analysis: Feature Extraction for Compositional Data


This paper discusses a problem of extracting features from compositional data (nonnegative features that sums up to 1). The problem is interesting though I am not sure about the major difficulties. To get a proper projection into lower dimensional space, we have to conform to a certain set of constraints (balanced rearrangement). A regularization operator is devised to preserve the Euclidean geometry. The rearrangement will shrink the data while regularization expands them.

The optimization is quite strange (solved via genetic algorithm). The maximization w.r.t. \alpha looks like a MLE but the minimization w.r.t. the rearrangement matrix doesn't make much sense to me.

I am still unsure about the application's intention.