Thursday, March 12, 2009

Gaussian Process Classification for Segmenting and Annotating Sequences


This is quite an interesting paper. I find it interesting because it does show this delicate connection between several algorithms, GP, kernel logistic regression and SVM; GPSC, CRF and M3N.

GP, when used in MAP inference, will be something comparable to KLR and SVM. We know, KLR uses the negated likelihood as the loss function while traditionally SVM employs Hinge loss. And a penalty (norm regularizer) has intuitive interpretation in SVM. GP, though as a nonparametric counterpart of PKLR in Bayesian statistics, shows that it can be viewed as an extension of KLR's dual form (from kernelization) and therefore its connection to SVM (it has a natural Lagrange dual) is just the difference of loss.

This paper further explores the relation in structured problem. This relationship is from the kernel matrix (two parts, one for node features, another for transitions). Just as in M3N, we will encounter the problem of exponential labels for a sequence. As in M3N we can explore the structure of the graph, we can also simplify the model using the graph information (thus only margins are required).

Though it is still not a fully Bayesian method (not a decent GPC :-D), it shed lights on the relationship of several famous frameworks. Maybe we can try EP on the fully Bayesian model as the previous paper on Baysian CRF.

No comments: