Sunday, May 10, 2009

The Kernel Mutual Information


We know mutual information can be used to test the independence of r.v.s. The difficulty with mutual information is that the distribution is usually unknown, either we have to make a density estimation or entropy estimation from samples.

This paper introduces the so-called KMI (kernel mutual information), which in practice has comparable performance with KGV (in Bach and Jordan's paper). It comes from measuring the mutual information between two multi-variate Gaussian vector,
I(x; y) = -\frac{1}{2} \log \left( \prod_{i = 1}^{\min(p_x, p_y)} (1 - \rho_i^2)\right).
We use the correlation in the RKHS for this \rho_i,
\rho_i = \frac{c_i^\top( P_{x, y} - p_x p_y) d_i}{\sqrt{c_i^\top D_x c_i d_i^\top D_y d_i}},
where P_{x, y}, p_x, p_y, D_x, D_y are approximated with samples and an assumed grid(it will be cancelled out in the end). By relaxing the denominator (to a bigger value), we find an upper bound for the mutual information,
M(z) = -\frac{1}{2} \log\left( \big| I - (\nu_x \nu_y)\tilde{K}^{(x)} \tilde{K}^{(y)} \big|\right),
where \tilde{K}^{(x)} and \tilde{K}^{(y)} are centered Gram matrices. This is the criterion for KMI. Since it's an upper bound for the mutual information, we can use it as a contast function (I don't know why the authors think we ca derive the result from the theorem 1, confused about their idea).

In their formulation, (regularized) KGV can be regarded as another way of relaxing the covariance. Therefore likewise this independence measurement can also be used for ICA task. In a way, this paper is a generalization of KGV, but I am still confused about their idea.

==
later I realized that the KMI is the upper bound of KC, therefore when KMI = 0 the KC will be 0 too.

No comments: