Mining the web for points of interest

From WikiPapers
Jump to: navigation, search

Mining the web for points of interest is a 2012 conference paper written in English by Rae A., Murdock V., Popescu A., Bouchard H. and published in SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval.

[edit] Abstract

A point of interest (POI) is a focused geographic entity such as a landmark, a school, an historical building, or a business. Points of interest are the basis for most of the data supporting location-based applications. In this paper we propose to curate POIs from online sources by bootstrapping training data from Web snippets, seeded by POIs gathered from social media. This large corpus is used to train a sequential tagger to recognize mentions of POIs in text. Using Wikipedia data as the training data, we can identify POIs in free text with an accuracy that is 116% better than the state of the art POI identifier in terms of precision, and 50% better in terms of recall. We show that using Foursquare and Gowalla checkins as seeds to bootstrap training data from Web snippets, we can improve precision between 16% and 52%, and recall between 48% and 187% over the state-of-the-art. The name of a POI is not sufficient, as the POI must also be associated with a set of geographic coordinates. Our method increases the number of POIs that can be localized nearly three-fold, from 134 to 395 in a sample of 400, with a median localization accuracy of less than one kilometer.

[edit] References

This section requires expansion. Please, help!

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers. Cited 3 time(s)