We first take a look at their logic: for semi-supervised learning, a generative model is usually preferred since unlabeled data help estimate the margin distribution
\Pr(x)
. In a Bayesian MAP formulation, we are actually\min_\alpha - \sum_i \log \Pr(y_i \mid \alpha, x_i) - \log \Pr(x_u \mid \alpha)\Pr(\alpha)
which is a little different from a direct generative model. Here the first term is actually a discriminative term and the second term is a penalty from unlabeled part (therefore it is more similar to a ``supervised loss + penalty'' model). This paper does talk about models of the latter, using auxiliary problems.The two-view model means, analogous to co-training, we have two view of feature vector
x
, namely z_1(x), z_2(x)
, which are inpdependent conditioned on the label. The different thing about this model is in order to solve \Pr(y \mid z_1, z_2)
, we need \Pr(y \mid z_1), \Pr(y \mid z_2)
. Now we only consider \Pr(y \mid z_1)
. One possibility is to make a low-rank decomposition of \Pr(z_2 \mid z_1) = \sum_y \Pr(z_2 \mid y) \Pr(y \mid z_1)
but the LHS is sometimes impossible to compute. An approximation is to encode z_2
with a set of binary labels t_1^k(z_2)
. Then \Pr( t_1^k \mid z_1) = \sum_y \Pr(t_1^k \mid y) \Pr(y \mid z_1)
can be computed. By increasing the number of related binary labels t_1^k
we may have a good estimation of \Pr(y \mid z_1)
.They proposed two models (one linear and the other log-linear, which resembles linear regression and logistic regression in a way). The linear version coincides with the SVD-ASO model in their JMLR paper. The log-linear model is solved via EM-like algorithm.
The thing is what kind of binary auxilliary function would be essential to our semi-supervised problems? This might be a key to understanding their JMLR paper for multi-task learning.
No comments:
Post a Comment