Monday, June 6, 2011

Near-Optimal Hashing Algorithms For Approximate Nearest Neighbor in High Dimensions


This is a survey paper on locality-sensitive hashing (LSH). LSH is quite useful for approximately retrieving similar items according to a given distance. The fact is, once we have enough data, those approximate nearest neighbors would perform as well as extact ones. So why bother computing the exact version?

The idea is based on stochastic algorithms in common literatures. The idea is to select a random projection and combine the binning to the final hash table. To understand intuitively how this works, given a point, the random projection into one bin will cut down the probability the retrieved points are not around the query by some extent. And when more projections are made, we get closer. E.g. for Hamming distance, the projection can be further simplified as a subset of randomly chosen coordinate indices.

There are some software implementing this LSH idea. I would like to try them in the following days.

No comments: