Paper Scanner: The Kernel Mutual Information

Sunday, May 10, 2009

The Kernel Mutual Information

by Arthur Gretton, Ralf Herbrich and Alexander J. Smola

We know mutual information can be used to test the independence of r.v.s. The difficulty with mutual information is that the distribution is usually unknown, either we have to make a density estimation or entropy estimation from samples.

This paper introduces the so-called KMI (kernel mutual information), which in practice has comparable performance with KGV (in Bach and Jordan's paper). It comes from measuring the mutual information between two multi-variate Gaussian vector,

I(x; y) = -\frac{1}{2} \log \left( \prod_{i = 1}^{\min(p_x, p_y)} (1 - \rho_i^2)\right).

We use the correlation in the RKHS for this \rho_i,

\rho_i = \frac{c_i^\top( P_{x, y} - p_x p_y) d_i}{\sqrt{c_i^\top D_x c_i d_i^\top D_y d_i}},

where P_{x, y}, p_x, p_y, D_x, D_y are approximated with samples and an assumed grid(it will be cancelled out in the end). By relaxing the denominator (to a bigger value), we find an upper bound for the mutual information,

M(z) = -\frac{1}{2} \log\left( \big| I - (\nu_x \nu_y)\tilde{K}^{(x)} \tilde{K}^{(y)} \big|\right),

where \tilde{K}^{(x)} and \tilde{K}^{(y)} are centered Gram matrices. This is the criterion for KMI. Since it's an upper bound for the mutual information, we can use it as a contast function (I don't know why the authors think we ca derive the result from the theorem 1, confused about their idea).

In their formulation, (regularized) KGV can be regarded as another way of relaxing the covariance. Therefore likewise this independence measurement can also be used for ICA task. In a way, this paper is a generalization of KGV, but I am still confused about their idea.

==
later I realized that the KMI is the upper bound of KC, therefore when KMI = 0 the KC will be 0 too.

Paper Scanner

Sunday, May 10, 2009

The Kernel Mutual Information

No comments:

Recent Comments

Scanning Areas

Paper list

Labels

Scanner