Wednesday, February 14, 2007

Collabrarive Prediction Using Ensembles of Maximum Margin Matrix Factorization


There are two key techniques in this article. One is Maximum Margin Matrix Factorization(MMMF), which I should consult the article Fast maximum margin matrix factorization for collaborative prediction in ICML 2005, the other is the ensemble method used for later classifications.

The setting of MMMF is similar to that of NMF. It has a huge and sparse matrix, Y, representing the ratings from the users, each column of which stands for an item and each row of which stands for an user. Since it's a rating matrix, the elements should be a discrete value taken from 0--R. We'd like to find matrices U, V and discretization parameters thetair such that the following objective can be minimized:
J( U, V, theta) = lambda * ( |U|F2 + |V|F2 )/2 + sumover r sumover ij h( Tijr * ( thetair - Ui Vj') )
where Tijr is 1 if r >= Yij and -1 if r < Yij. h is a loss function, called Hinge loss, which is differentiable. And hence the optimization can be carried out with gradient-based methods. There is another version of MMMF based on SDP (c.f. Maximum margin matrix factorization in NIPS 2005). MMMF stated here can only be solved with local minima and susceptible.

As for the ensemble part, there are not very new techniques. Simply several voting methods and emsemble with different initial values or bagging.

Notice that in the objective function, there won't be anything for non-rated movies. Here is the setting of weak test of generalization. Held out one column, which is the ratings of a single movie. Compute the MMMF of the remaining matrix so that from the computed U, V and theta, we can predict the ratings for the held-out movie. Another strong test of generation differs. A subset of users are held out first. With the remaining users' ratings, MMMF is done. Then give a held-out user's ratings of all movies except one and ask for the unknown rating. This is done by fixing V and tuning U.

The regularization parameter lambda is chosen with a validation set. The error rate will suggest a proper lambda for later usage.

No comments: