This paper talks about another way of using the rating matrix.
Given features of users and items, there are some correlations between them. The CCA idea is ``unsupervised'' since it works on two kinds of data and finds the maximum correlations without further ``supervised information.'' Then if indeed we have some ``supervised'' information, that is how correlated a user is to some items, the problem becomes a ``supervised learning'' version. Actually, most of the literatures are in this style. Haha, you might have realized my point now: is there any ``semi-supervised'' version? Yeah, I'd like to develop one :-)
OK, let's go back to this paper. So with supervised information, the problem of CCA becomes finding two linear mapping of user's feature and item's feature into a common subspace, so that their inner product approximates the given one. This might be problematic, since obviously the ratings might be non-consistent with their relative face values. But anyway we have this basic idea: it's like what we get in PCA.
The second step is to make some probabilistic model out of it (PCA to PPCA) so that everything gets some interpretation and a disciplined way to train the model and making inferences. The paper shows us an interpretation: there is a common subspace that each user/item is drawn from (zero mean, some variance?), which can be obtain from the original feature space (via a linear transformation). The observed rating can be seen as the inner product mixed with a noise. With this graphical model, we may do the learning via EM. Here the authors chose the MCEM (MC to compute the expectation).
Let's try some of the ideas.