Thursday, July 2, 2009

Robust Feature Extraction via Information Theoretic Learning


This paper is based on the so-called Renyi's entropy, which is defined as
H_2(p) = -log \int p^2(x) \,\mathrm{d} x
Given a random variable x's sample, the empirical Renyi's entropy is estimated with kernel density estimator
\hat{H}_2(X) = - \log \hat{V}(X) + \text{const}, \qquad \hat{V}(X) = \sum_i \sum_j g(x_i - x_j, \sigma)
where g(x-z, \sigma) = \exp( -\| x - z \|^2 / 2\sigma^2). \hat{V}(X) is called information potential. For two r.v.s, we have similar result,
\hat{H}_2(X_1, X_2) = -\log \hat{V}(X_1, X_2) + \text{const}, \qquad \hat{V}(X_1, X_2) = \sum_i \sum_j g(x_i^{(1)} - x_j^{(2)}, \sigma)
where the second term measures the correlation of two r.v.s.

The proposed objective is to find a projection Y = WX such that
\max_{W} (1 - \lambda) \hat{V}(WX) + \lambda \hat{V}(WX, C) - \gamma \| W \|_\text{fro}^2
They find that when \lambda = 1 we have a so-called robust M-estimator. The optimization algorithm they find is very similar to majorization minimization (auxilliary function). The interesting relationship they listed in the paper includes that with LPP. The iterative algorithm solves a LPP/LapRLS/SRDA in each iteration.

I am afraid this looks quite like the unsupervised HSIC for SDR problem. I will check it soon.

No comments: