An open-source toolkit for mining Wikipedia

From WikiPapers
Jump to: navigation, search

An open-source toolkit for mining Wikipedia is a 2013 journal article written in English by Milne D., Witten I.H. and published in Artificial Intelligence.

[edit] Abstract

The online encyclopedia Wikipedia is a vast, constantly evolving tapestry of interlinked articles. For developers and researchers it represents a giant multilingual database of concepts and semantic relations, a potential resource for natural language processing and many other research areas. This paper introduces the Wikipedia Miner toolkit, an open-source software system that allows researchers and developers to integrate Wikipedia's rich semantics into their own applications. The toolkit creates databases that contain summarized versions of Wikipedia's content and structure, and includes a Java API to provide access to them. Wikipedia's articles, categories and redirects are represented as classes, and can be efficiently searched, browsed, and iterated over. Advanced features include parallelized processing of Wikipedia dumps, machine-learned semantic relatedness measures and annotation features, and XML-based web services. Wikipedia Miner is intended to be a platform for sharing data mining techniques. © 2012 Elsevier B.V. All rights reserved.

[edit] References

This section requires expansion. Please, help!

Cited by

This publication has 1 citations. Only those publications available in WikiPapers are shown here:

Cited 16 time(s)