Friday, January 21, 2011

Random Walks for Image Segmentation


This paper proposed a random walk based algorithm for image segmentation. The idea is quite straightforward: some part of the images are labeled by the user and other areas are segmented using the random walk probability that stops at these landmarks.

The interesting part of the paper is that it proposed a (several) linear system to solve the probability, instead of solving an eigenvalue problem.
L_U X = -B
where L_U is the graph Laplacian of the unlabelled samples and the B is the cross block in the Laplacian of the whole graph.

The idea was used in t-SNE to build p-table for landmarks.

Friday, January 14, 2011

Annotating Photo Collections by Label Propagation According to Multiple Similarity Cues


The image annotation problems is actually a multi-label propagation, unlike three previous papers. We are only interested in the label propagation part. Actually it is only a weighted counting for independent labels. Nothing new :-(

Thursday, January 13, 2011

Semi-supervised Classification Using Linear Neighborhood Propagation


The so-called linear neighborhood propagation relies on the idea from LLE. Instead of using the graph weights directly, as in former methods (c.f. Zhou and Zhu's papers, two previous paper), the weights are computed using the idea from LLE. Therefore, in a way Zhou's version is something like diffusion map, Zhu's Laplacian eigenmap while this one LLE. We may find those counterparts in manifold learning.

The procedure to calculate the weights are identical to that of LLE (by minimizing the affine reconstruction error). Then the weights are used to propagate the labels with the same objective function as in semi-supervised LLE (or landmark LLE). In this way, it eliminates the selection for a width for Gaussian kernels.

Well, why not make a LTSA version? Haha...

Semi-supervised Learning Using Gaussian Fields and Harmonic Functions


This is also a piece of even earlier work on label propagation in semi-supervised setting. There are many fancy nouns, Gaussian random fields, energy minimization and harmonics... But it ends like something resembling the previous paper.

They explain the label propagation as the minimization of the so-called energy function of a Gaussian field, which is actually the quadratic form of the Laplacian and therefore corresponds to the so-call elliptic PDE (and hence many more nouns: Green function, harmonics). The little difference might be they are using D^{-1} W (one-step transition matrix) instead of the symmetrized and normalized version D^{-1/2} W D^{-1/2} in the previously scanned paper. The closed form solution is obtained via Schur complement instead of the iterative procedure in Zhou's paper.

The paper also discusses connections with random walks and electric networks, graph kernels, spectral clustering and N-cut. I believe due to the connection, this propagation idea must have been discussed with all kinds of variants (normalized? symmetrized?)

Wednesday, January 12, 2011

Learning with Local and Glabal Consistency


This paper is about label propagation in semi-supervised learning setting. The basic idea about label propagation is to use a graph as a random walk structure. The label is propagated with the following equation
Y(t + 1) = \alpha L Y(t) + (1 - \alpha) Y(t)
where Y(t) is the label matrix (in multi-class classification, it is a unit vector for labeled sample) and S is the normalized weight matrix. The iterative procedure will result in a limit of
Y^\star = (I - \alpha S)^{-1} Y
This reveals some connection with the graph Laplacian.
Th interesting part is how we may develop a parallel version of this idea.

Correlated Label Propagation with Application to Multi-label Learning


This paper talks about multi-label propagation. Usually label propagation is commonly seen in semi-supervised learning tasks. As for multi-labeled tasks, the critical difference is that labels may have correlationships (co-occurrences).

The paper proposes a linear programming model for the task
\max_{z \in \mathbb{R}^n} \sum_{k = 1}^n \alpha_k z_k \qquad \text{s.t. } \forall t \in \{ 0, 1\}^n, z^\top t \leq \sum_{i = 1}^N K(x_i, x_q) \Omega(t^\top t(S_i)), z \succeq 0
where \Omega is a concave function. The interpretation of this objective function is: for any possible labelling of the test sample, the propagated score is bounded from above by weighted (by the similarity, depicted by K(\cdot, \cdot)) sum of score from those samples sharing labels.
The amazing thing about the objective function is that a greedy algorithm can find the optimal when \Omega is concave and the solution is only determined by the order of the undetermined coefficients \alpha_k. This optimization is related to the so-called sub-modular optimization (see here).