Tuesday, May 15, 2007

Graph Model Selection using Maximum Likelihood


This might be the first paper that I read on random graphs. Although I have learnt graph theory a bit so far, theoretical preparation still impedes a quick understanding. There are several comman sense in random graph theory:
  • Power-law degree distr---In some graph, the degree distr looks like a exponential/geometry distr.
  • Small world---The avaraged shorest length between two vertices is not as long as what we might imagine.

This paper shows the ability to describe a real world network (graph) that several random graph generating strategies might have. In order to compare them, the authors suggest comparing their likelihood value. Each generating strategy has several parameters that control the procedure and the parameters are estimated via MLE. At last, by comparison of likelihood values, the authors states the suitability of each model.

The four strategy exemplified in the paper are:
  • PA model: The graph is obtained by growing from a single vertex. New vertices are added by connect to existent ones, with a preferable probability to those of large degree/lots of neighbors.
  • PRG Model: First generate the ingree/outgree of each node and match them.
  • SW model: The nodes are placed on a grid, which has already edges connecting the neighbors. By adding more edges according to the Manhattan distance of every pair of vertices, the final model is obtained.
  • ER model: The existence of each potential edge has the probability p.

It can be seen that, MLE of ER model can be obtained easily. For others, it is somehow difficult and MCMC is adopted to overcome it. However, I decide to examine the technical details later.

A possible problem of this paper is that it doesn't take into account the complexity of the generating strategies. It is natural that the more complex a model is, it better fits the real world data. However a better fitting won't necessary yield a good explanation, since a more complicated procedure might have better fitting than one of the four models. So the conclution of this paper is dubious.

No comments: