Sunday, April 5, 2009

Mixture-of-Parents Maximum Entropy Markov Models


Like skip-chain CRF, the mixture-of-parent MEMM also aims at utilizing the long-range interactions for NER problems. Skip-chain CRF, though, is beautiful in theory. The inference on it cannot be easily carried out (there are loops). This paper adds directed edges to the same r.v.s but with the assumption that Pr( y | pa(y) ) is a mixture of Pr( y | y') where y' in pa(y) (please note that x is omitted in this formulation and here we refer to a single element). With this, the conditional probability Pr(y | x) can still be calculated with dynamic programming (now we have to sum over all parents as well and here we refer to a sequence).

The tricky part they employed is the mixing proportion of the parent is assumed to be known as equal for one r.v. Then they can get a convex optimization problem which can be solved just as MEMM. By computing the gradient, L-BFGS can be applied. They actually have another objective (log conditional margins instead of log joint conditional) which is non-convex but it is well-behaved in practice.

No comments: