Friday, March 27, 2009

Sufficient Dimensionality Reduction


The authors proposed a nonparametric way for sufficient dimensionality reduction. Their idea is very similar to CME. We don't know Pr( x, y ) but we may estimated Pr(x), Pr(y) and several measurement φ(x) (the reduced features). Therefore, given a φ(x), the minimum mutual information I(q) of q(x, y) (the estimated joint density) can be seen as the goodness of φ(x) and the maximization of this minimum leads to the best φ(x) available to us.

The minimum mutual information I(q) can be obtained with variational methods. We are actually finding the best q(x, y) such that q(x) = Pr(x), q(y) = Pr(y) and the expectation of φ(x) on the conditional q(x | y) and Pr(x | y) is the same. The solution has a CME flavor (the solution is a so-called I-projection of q(x, y) on all the distributions that satisfy the constraints): convex and unique. Let the corresponding Lagrange multipliers for φ(x) be ψ(y). Then in a sense, the functional I(q) can be regarded as the Lagrange function with the parametrization of q(x, y).

The maximization step can be converted into a minimization step, which can still be solved by I-projection. This is because I(q) = I(Pr(x, y)) - KL( Pr(x, y) | q ) where q(x) in the ``dual feasible set.'' And therefore we need two I-projections.

The algorithm then works in this way: fix the φ(x) part, project the conditional q(y | x) onto the subspace for tunable ψ(y) spaces; update the joint density with q(y | x) Pr(x); fix ψ(y) part, project the conditional q(x | y) onto the subspace for tunable φ(x) spaces; update the joint density with q(x | y) Pr(y).

I am not quite sure about my own interpretation though but the convergence (to a local minimum) of the algorithm is not difficult to prove. The I-projections can be done partially with a few GIS iterations. The most important idea in this paper is from information geometry: the Pythagoras theorem can be interpreted as a geodesic distance on the manifold and the dual interpretation of φ(x) and ψ(y) is very intuitive.

A related stat article:
Francesca Chiaromnte and R. Dennis Cook: Sufficient dimension reduction and graphics in regression, AISM.

No comments: