Thursday, July 12, 2012

Click Shaping to Optimize Multiple Objectives


Some properties have quite different business requirement other than purely CTR, because e.g. portals may serve as traffic direction to other properties. Therefore we usually have several metrics in mind. In this paper,  there are three objectives in discussion: total clicks, total time spent, average time spent. The first one may be seen as the traditional way, which optimizes the current page while the other two are metrics for downstream pages.

Therefore we have several ways of combining them:
  • linear combination, which may be seen as the linearization of the Pareto optimal
  • goal programming, which maximizes downstream metrics while put a linear constraints on CTR (no worse than a given value, obtained a priori)
  • localized multi-objective program, which maximizes the minimum ratio of downstream property loss (compared to CTR maximization scheme), subject to CTR constraints. This is convex but not differentiable. With some trick, it may be turned into a quasi-convex optimization problem and solved via be-section algorithm
  • a relaxed version of localized multi-objective program, which maximizes the downstream metric, subject to the CTR constraint and downstream constraint (no less than the original)
The experiments can only be carried out on segments of users and some of the down stream properties must be estimated a priori. The result shows a slightly sacrifice of CTR may lead to comparatively higher increase in downstream properties.

Wednesday, June 27, 2012

Integer Programming for Multi-class Active Learning


This paper proposed an integer programming based method for active learning in multiclass discriminative tasks. The idea is to select enough samples for each one-vs-rest SVM based on margin-distance uncertainty and we hope especially those may contributed to more classifiers are encouraged. That's for each sample we put an indicator saying whether it is selected as training. The constraints are for each classifier, the selected samples must be enough (according to the rank of uncertainty, we take a little more than required candidates so that this constraint may be satisfied). This is solved by feasibility pump (implemented by lpsolve).

Saturday, April 21, 2012

Online Models for Content Optimization


This is something very detailed for how Yahoo!'s today module is optimized for overall CTR. Internally the project was first named COKE but later rebranded as CORE (content optimization and relevence engine).

Here I only include some key points from the paper. Overall, the paper is written in a very succinct way and worth your time.
  • mini batch learning in 5 minute segs;
  • online models to track CTR including EMP, SS and OLR
  • E&E setting with a random bucket
Let's take a look at the modelling part. Strangely still, I am very clear about the OLR part:
  • it is a Bayesian version of logistic regression, Gaussian prior on parameters
  • approximate inference with Laplace approximation scheme
  • it may be simplified with uncorrelation assumption but the performance will suffer;
As for EMP and SS, SS is simply user-segmented EMP, i.e. building several EMP for each user segment. EMP looks like a Kalman filter (LDS) which I will take a detailed look soon.

Friday, April 20, 2012

Image Saliency: From Intrinsic to Extrinsic Context


The method contains three steps:
  • by kNN, the given image's saliency is computed from the images in the dictionary (global or patch by patch basis);
  • a warping algorithm helps to align the kNN to the original image/patch
  • the saliency is the reconstruction error
The idea is to find those part of the image that can't be explained by its neighbors (i.e. might be salient enough to catch your attention).

We may use an extra dictionary or we may simply split the image into patches.

Wednesday, April 18, 2012

Enhancing by Saliency-guided Decolorization

by Codruta Orniana Ancuti, Cosmin Ancuti and Phillipe Bekaert

This paper talks about a de-colorization method (maybe useful for de-saturating photos?). The basic idea is quite simple: the ultimate luminance is the original one multiplied by a coefficient that is defined by hue and saturation. There are certain tuning for the highlighted regions. I just got a simple experiment with OpenCV. But it doesn't work very well...

Thursday, February 2, 2012

Frequency-tuned Salient Region Detection


This is a very simple paper. The idea is to blur the image a little (with Gaussian) and compare the distance of each pixel to the mean. With this saliency map, DoG can be applied to extract the edges.

This method will fail if the object has many colors or the background is complex.

Saturday, December 31, 2011

AppJoy: Personalized Mobile Application Discovery


This paper introduces the author's app AppJoy, which collects users' usage of apps on a mobile device, derives the similarity scores between apps based on the usage and recommend apps based on similarity scores. The idea is to utilize the time and spatial information along with the apps. Contrastive to other recommendation algorithm, based on ratings, the app usage may serve as a much better feature. I guess this shows a simpler model with strong feature might serve better than a complicated model with weak features.

It seems that, if we decide to move on to applications in mobile devices, we'd better get to know the details of the mobile platforms. E.g. IOS might not allow you to collect such usage information. So the experiments can only be done on Android.