Wednesday, March 7, 2007

Some thinking on Parzen window and Semi-supervised learning

Here is an idea originates from the following refereces:
Pattern Classification, by O. Duda: It presents me a basic form of Parzen window. And I think it is quite natural to use other distributons instead of a uniform distribution in a super cube. So I wondered where I can find more information on this topic. PageRank then suggested severl useful links and mnauce gave several more and comments.
  • Emaneul Parzen, On estimation of a probability density function and mode. Annals of Mathematical Statistics, 33(3):1065-1076, 1962.<available from JSTOR>
  • B. Silverman, Density estimation for statistics and data analysis, Chapman and Hall, London, 1986<books.google>
  • Asymptotic Statistics, by Vann Der vaart<in library>
  • A framework for probability density estimation, by John Shawe-Taylor<PASCAL>

Learning with Kernels, by Bernhard Schölkopf and Alex Smola: There is a simple mean classifier in it which indicates that kernel framework will convert many algorithms to superior ones. And then it tells us this classifier can be taken as a difference of Parzen windows. And then I think about SVM. The solution of a SVM tells us which samples are necessary to establish a discriminant classifier and what's their weight.

Manifold Parzen Window, by Vincent, Bengio(in NIPS 2003): They suggested a method to incorporate manifold learning ideas with density estimation. The final version is not far from what I have thought about the problem, local PCA just as MFA does. But I suddenly realize that the neighborhood sellection method used by Zhenyue Zhang is also feasible here. However they indeed extended this version in their later work.

Non-local Manifold Parzen Window, by by Vincent, Bengio(in NIPS 2005). However, then they still stayed at manifold learning. They haven't progressed as fast as I imagined. Though I am still confused about the value of this work now(am silly), I guess there can be something if we try to put the problem in the field of semi-supervised learning paradigm.

Saturday, March 3, 2007

Efficient Co-Regularized Least Squares Regression


In semisupervised learning, there is an important technique called co-learning. The basic idea is to train two independent classifier. Let them mutually interact with training errors in order to agree on unlabelled examples at last. Two phrase: independent classifiers, agree on unlabelled data.

Now let's focus on what the authors show us. As is know, the framework of SVM give a regularization technique for supervised learning already. So in order to incorporate unsupervised data, add a "agreement regularizer".

Then by representer theorem, the optimization result requires O(M3(m+n)3) where M is the number of different views, m is the number of unlabelled data and n the number of labelled data.

However, usually m will be huge due to the accessibility of unlabelled data and we will suffer with a cubic algorithm. An alternative simply uses a subspace of the original RKHS, namely the one expanded with labelled data only. Then the optimization becomes the following equation:

and this can be solved in O(M3n3+M2m).

Another result given in this paper is a distributed algorithm for the optimization problems mentioned above. The algorithm called block coordinate descent is analogous to steepest gradient descent while the former one only scrambles along several coordinates associated with the message sent by another classifier in a different site.


The paper also lists several literatures on co-learning for my future study:
Here is another survey: