Wednesday, February 4, 2009

Semi-supervised Learning via Gaussian Processes


The authors present a way of modifying the discriminative model in order to use the unlabelled data. As we know the parameters are conditionally independent with the unlabelled data. They propose to add a null-category variable to the model which ensures there might exist a certain dependency between the parameters and the unlabelled data.

The main idea is the unlabelled data will get a 0 label and the parametric model is thrown away by a Gaussian latent variable. Thus, the dependecy x -> y -> z will be x -> f -> y -> z. Nonparametric model needs tuning hyperparameters (EM to maximize Pr( y | x, z)) and the inference is done with EP as in a GPC. The conditional probability of Pr( y=0 | f ) suggests a low probability around the sample, if it is unlabelled and the value of f is large. And Pr( z | y = 0 ) is simply 0-1 (indicator of labels) and Pr( z | y = 1 or -1) is defined as a binomial variable. Then our model is done.

Why does it work anyway? One thing about the labelled data is for positive ones f should be positive and have a large absolute value and negative ones negative and a large absolute values. For unlabelled data, we don't have y and we use the conditional posterior probability Pr(y | x, z, D ) which involves setting z = 1 (instead of 0 as unlabelled samples) and Pr( z = 1 | y=0 )=0. Then the posterior could be computed.

I have to do some experiments on non-parametric Bayesian methods to gain more intuitive insight.

No comments: