Saturday, April 21, 2012

Online Models for Content Optimization


This is something very detailed for how Yahoo!'s today module is optimized for overall CTR. Internally the project was first named COKE but later rebranded as CORE (content optimization and relevence engine).

Here I only include some key points from the paper. Overall, the paper is written in a very succinct way and worth your time.
  • mini batch learning in 5 minute segs;
  • online models to track CTR including EMP, SS and OLR
  • E&E setting with a random bucket
Let's take a look at the modelling part. Strangely still, I am very clear about the OLR part:
  • it is a Bayesian version of logistic regression, Gaussian prior on parameters
  • approximate inference with Laplace approximation scheme
  • it may be simplified with uncorrelation assumption but the performance will suffer;
As for EMP and SS, SS is simply user-segmented EMP, i.e. building several EMP for each user segment. EMP looks like a Kalman filter (LDS) which I will take a detailed look soon.

No comments: