Monday, December 22, 2008

Large Margin Hidden Markov Models for Automatic Speech Recognition


This work is the topic of Dr. Fei Sha's Ph.D. assertion. Two parts of it look familiar but I still have something not clearly comprehended.

The first part is about large margin GMM for multiway classification. First the separable cases (each class with one Gaussian) are considered: The highest probability of a certain class wins, which could be formulated with a quadratic form; the separable samples could be scaled such that a similar constraint as in SVM must be met; the maximal margin is equivalent to the minimal trace of the precision matrix of the Gaussian. If we allow samples that comes into the margin, slack variables could be introduced similarly. At last, if we add more Gaussians for each class, our constraints will increase much. They proposed to use a stronger inequality instead: taking the softmax (softmin indeed) of several terms instead of using the minimum results in a inequality with log-sum term. I don't know why the trace relates to the margin -,- From his Ph.D. thesis, it seems that the trace could be regarded as a norm for positive semi-definite matrices but it is used here simply because it will be a convex optimization problem (specifically a SDP problem). Please note that GMM is a generative model. But it is trained with maximum margin instead of maximum likelihood.

The second part is about the HMM. The GMM is used as the emission probability. Then we can write down the joint PDF easily. Now the HMM model does not need EM for training since in the given task the labels are known. The good thing in training is that the formulation has the same form as the previous one and hence still a convex optimization problem which could be solved with SDP.

I guess I will spend some time for his Ph.D, thesis later.

No comments: