Geographical classification of documents using evidence from Wikipedia is a 2010 conference paper written in English by Odon De Alencar R., Davis Jr. C.A., Goncalves M.A. and published in Proceedings of the 6th Workshop on Geographic Information Retrieval, GIR'10.

Obtaining or approximating a geographic location for search results often motivates users to include place names and other geography-related terms in their queries. Previous work shows that queries that include geography-related terms correspond to a significant share of the users' demand. Therefore, it is important to recognize the association of documents to places in order to adequately respond to such queries. This paper describes strategies for text classification into geography-related categories, using evidence extracted from Wikipedia. We use terms that correspond to entry titles and the connections between entries in Wikipedia's graph to establish a semantic network from which classification features are generated. Results of experiments using a news data-set, classified over Brazilian states, show that such terms constitute valid evidence for the geographical classification of documents, and demonstrate the potential of this technique for text classification. Copyright

