by Art B. Owen
This paper studies the case of binary classification when we have a fixed set of samples from positive class but infinite samples from negative class. Their main result is formulated in the following theorem:
Let n >= 1 and xi in Rd be fixed and suppose that F0 satisfies rhe tail condition
\int e^{x^\top \beta}( 1 + \| x\|) \,\mathrm{d} F_0(x) < \infty, \qquad \forall \beta \in \mathbb{R}^d
and surrounds\bar{x} =\frac{1}{n} \sum_{i = 1}^n x_i.
Then the maximizer of the centered log-likelihood satisfies\lim_{N \to\infty} \frac{\int e^{x^\top \beta} x \,\mathrm{d} F_0(x)}{\int e^{x^\top \beta}\,\mathrm{d} F_0(x)} = \bar{x}.
This theorem tells us several important things:
- There are two conditions we must satisfy if β does not diverge, either of which once violated will yield a counterexample (β will diverge).
- The convergent β only relates to the positive samples via their mean. Therefore it does NOT really matter how their are distributed.
- The author suggests this may be good for understanding the behavior of logistic regression and by removing the outlier or moving it towards the mean we can get a better model.
- If F0(x) is a Gaussian or a mixture of Gaussians, β can be calculated as in LDA (the generative counterpart).
This is actually written by a stat guy and involves more derivations than other pure machine learning papers. This is quite interesting. I'd like to explore some theoretical properties of simple models but up to now I haven't found a proper point.
No comments:
Post a Comment