Now ICML 2007 has released its online proceedings. Each time there will be about 150 papers waiting to be read. Now, I still haven't finished ICML 2006, perhaps, there are still 70+.
So, I have to speed up to catch up with papers!
Friday, May 25, 2007
Wednesday, May 16, 2007
Optimal Dimentionality of Metric Space for Classification
by Wei Zhang, Xiangyang Xue, Zichen Sun, Yue-Fei Guo and Hong Lu
This is an ICML 2007 paper. After reading it, I found nothing really astonishing. However it gives another direction of research. The idea is based on one of the papers I read long ago, LPP. The technique of LPP is based on Laplacian Eigenmap by finding a linear transform that minimize a similar object function as Laplacian Eigenmap. However, they are both unsupervised learning techniques.
This paper however, when the Laplacian matrix is constructed, supervised information is incorporated by setting a positive value for samples of the same class, 0 for too far away samples and a negative value for samples of different classes.
Then the so-called optimal projection uses the eigen vectors of negative eigenvalues. Though the idea is simple and direct, it works anyway and the experiment and analysis are both detailed and convincing.
This is an ICML 2007 paper. After reading it, I found nothing really astonishing. However it gives another direction of research. The idea is based on one of the papers I read long ago, LPP. The technique of LPP is based on Laplacian Eigenmap by finding a linear transform that minimize a similar object function as Laplacian Eigenmap. However, they are both unsupervised learning techniques.
This paper however, when the Laplacian matrix is constructed, supervised information is incorporated by setting a positive value for samples of the same class, 0 for too far away samples and a negative value for samples of different classes.
Then the so-called optimal projection uses the eigen vectors of negative eigenvalues. Though the idea is simple and direct, it works anyway and the experiment and analysis are both detailed and convincing.
Tuesday, May 15, 2007
Ranking on Graph Data
The ranking problem asks for a function which gives us an order of samples. This requirement can be found on search engines, which give the user a sequence of search results of the preferred order of the user. I don't have earlier experience of ranking problems, so I am just curious about this article. After scanning, it turns out highly related to SVM theory.
There is a ranking risk matrix that descibe the risk of ranking error.
So the empirical ranking risk can be formulated as a functional of ranking function, which is what we are looking for. As for a graph, the function degenerates to a vector. As is in SVM, a regularizer is chosen so as to smooth the function by punish the divergence of values on nearby vertices. On thinking about this, the Laplacian regularizer is a commonly used one. (Just think about the Laplacian Eigenmap, the optimized value)
Directed graphs can be treated likewise. The paper later suggests a connection to RKHS and application in bioinformatics. Indeed, multiple alignment requires a ranking result.
Since the loss function relaxed in this paper is still Hinge loss, the idea of Trading Convexity for Scalability should be applicable to it.
Graph Model Selection using Maximum Likelihood
This might be the first paper that I read on random graphs. Although I have learnt graph theory a bit so far, theoretical preparation still impedes a quick understanding. There are several comman sense in random graph theory:
- Power-law degree distr---In some graph, the degree distr looks like a exponential/geometry distr.
- Small world---The avaraged shorest length between two vertices is not as long as what we might imagine.
This paper shows the ability to describe a real world network (graph) that several random graph generating strategies might have. In order to compare them, the authors suggest comparing their likelihood value. Each generating strategy has several parameters that control the procedure and the parameters are estimated via MLE. At last, by comparison of likelihood values, the authors states the suitability of each model.
The four strategy exemplified in the paper are:
- PA model: The graph is obtained by growing from a single vertex. New vertices are added by connect to existent ones, with a preferable probability to those of large degree/lots of neighbors.
- PRG Model: First generate the ingree/outgree of each node and match them.
- SW model: The nodes are placed on a grid, which has already edges connecting the neighbors. By adding more edges according to the Manhattan distance of every pair of vertices, the final model is obtained.
- ER model: The existence of each potential edge has the probability p.
It can be seen that, MLE of ER model can be obtained easily. For others, it is somehow difficult and MCMC is adopted to overcome it. However, I decide to examine the technical details later.
A possible problem of this paper is that it doesn't take into account the complexity of the generating strategies. It is natural that the more complex a model is, it better fits the real world data. However a better fitting won't necessary yield a good explanation, since a more complicated procedure might have better fitting than one of the four models. So the conclution of this paper is dubious.
Subscribe to:
Posts (Atom)