Monday, May 11, 2009

A Dependence Maximaization View of Clustering


HSIC is an upper bound of the COCO which in theory has been proved to be equivalent to independence. Therefore HSIC can be used as an independence test. However, as an upper bound, it might not be suitable if we want to maximize the dependence of r.v.s IMHO. So the motivation of using HSIC for this purpose (clustering) is still vague.

For clustering, we maximize
\tr HK_xH K_y
for making features and labels dependent. So we have to find a kernel for the discrete variable y: delta kernel; or the kernel in the paper,
\Pi A \Pi^\top
where A is the Gram matrix for different clusters and \Pi is the assignment matrix. The paper also consider other kernels, relating this algorithm with kernel PCA (maybe kernel k-means also?).

The optimization is quite difficult as other clustering algorithms. It initialize an assignment matrix randomly and repeat updating the assignment matrix until it converges. We loop by the samples, finding the best assignment that maximizes the HSIC. Please notice the matrix A will be updated when the assignment matrix is updated. Like k-means, each sample has the same weight and A is simply a diagonal matrix with the inverse of number of samples in this cluster. Another choice generalizes the weights. This leads to a family of clustering algorithms by choosing A.

Maybe it's worth considering about the designing idea more carefully.

No comments: