Thursday, December 25, 2008

Multi-Instance Multi-Label Learning with Application to Scene Classification


To solve MIML problems, the authors proposed two solutions: one via MIL and the other via MLL.

The first alternative, via MIL to solve MIML tasks, is quite simple since we are actually working on several independent MIL problems. Therefore any MIL algorithm could be applied here. This paper cited an algorithm called MIBoosting. Though it is written in the paper as a consistent form, I think we are actually training several independent classifier for each label, only the weights are adjusted in the loop of boosting simultaneously. This work comes from others (c.f. [1]) who derived the logistic regression model and boosting model for MIL. [1] is very simple: on logistic regression, they proposed two flavors, one simply predicting with the mean of each sample and the other with mean of log ratio of probability (please notice the training); on the boosting algorithm (bosting a MI learner is trivial), the difference between the original boosting the MIboosting is MIboosting adjusts the weights on bags and the classifier is ok if it yields more correct labels than incorrect ones on a single bag.

The second alternative, via MLL to solve MIML tasks, takes into account of the interactions between labels. Usually, we may notice that certain labels have high-correlations and exploiting this will help build a better classifier. So the problem becomes how to convert a MIL problem into a SIL one. The standard method is something like Lipschitz transform (if the term is correct),, that is to select several medoids from the data and each bag might be formulated as a fixed feature vector whose features come from the medoids and the bag (the distance, here Hausdorff distance). The we can apply MLL algorithm, such as MLSVM [2] to the problem. In fact, MLL is comparatively more difficult to evaluate (there might be missing labels, incorrect labels). [2] compares several strategies in training SVM for MLL problems and several criterion for evaluation. Strategies: MODEL-s labels each data with the label of most significance; MODEL-i doesn't use those samples with multiple labels; MODEL-n add a new class for each multi-labeled sample (that's completely the same labels imply the same class); MODEL-x uses a multi-labeled sample multiple times as positive samples. The criterions for assigning labels: P-criterion assigns a sample with labels when SVM gives a positive value; T-criterion acts differently when all SVM yields negative value (P-criterion puts it unknown) and assigns a label when the corresponding SVM least negative score; C-criterion labels the sample with closed enough outputs of SVMs. [2] finds that MODEL-x with C-criterion works better than other combinations.

References:

No comments: