Tuesday, February 17, 2009

Bayesian Conditional Random Fields


Well, as the title of the paper suggest, we have to use priors, compute the posteriors instead of tuning the parameters. I guess We can even use the GP-LVM for the features' part (since it is something like a log-linear model, now we take part of the parameter and say oh it is an r.v., so it is reasonable to say oh now we put it as a GP). The last author makes me think about the final solution must be solved by EP (the posterior can't be computed exactly as in Bayesian logistic regression). Then we have a general idea from the first glance.

Then you might wonder why bother using a Bayesian framework for the problem. The common answer is to avoid overfitting. There are comparisons of Bayesian methods of frequentists' ones. Some believe for problems with limited data, Bayesian takes an edge over the latter. As data increase, frequentists' model with regularization can also work well. When we have enough data, we can forget about all those tricky parts.

Since I have not tried EP on my own, here I just put some tricky parts and will check about it later. 1. they drop the exponential part (I don't understand why the CDF of Gaussian would work here, since now it is something like a distribution on transitions, remember the label bias problem?) 2. Gaussian prior, as expected; 3. EP or power EP (I will check later) 4. estimate the normalization factor without MCMC; 5. flatten the approximation structure (due to non-positive definiteness); 6. speed is even higher than CRF (data are too limited?)

1. the old model is log-linear; here a Gaussian CDF will not cause label-bias problem since it is not a density for yi. It will be easier to use Gaussian CDFs with EP (in the procedure of updating the moment of the approximating posterior).

3. It looks very simple, now it is not the KL divergence to minimize.

No comments: