Sunday, December 21, 2008

DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification


I decided to read this paper since LDA has been encountered several times recently. This LDA does not stand for Linear Discriminant Analysis, which refers to a linear classifier or sometimes the famous Fisher discriminant criterion. This term might originates from the paper published in JMLR 2003 (see below, I haven't verified this) and refers to Latent Dirichlet Allocation, which is designed for NLP modeling. It's a Bayesian generative model.

As we can see in the figure, the word w depends on the topic z, which is a r.v. of multinomial distribution. The prior for z is naturally a r.v. theta of Dirichlet distribution, whose hyperparameter is alpha. The parameter beta is simply a table of parameters for each topic. After we integrate out the latent r.v.s, z and theta, by maximizing the likelihood of the marginal distribution, i.e. empirical Bayesian method, the parameters alpha and beta could be estimated (with standard EM algorithm). To get the posterior distribution of z, we have to use variational approximation.

The LDA model could be used in unsupervised learning where the topic provides a dimensionality reduction/semantic hashing function. It could also be used in classification, where z is the corresponding label.

This paper provides additional information for the topics. These labels are side supervised information.

As you might see in thr figure, the label y comes into the model and the model becomes tougher. So the resulting inference is solved with MCMC methods. The improvement of the model demonstrates how the side information could help build a better model. Not sure about the core part. But it looks worth trying...

Several References
David M. Blei, Andrew Y. Ng and Michael I. Jordan: Latent Dirichlet Allocation, JMLR 2003

No comments: