Mining Domain-Specific Thesauri from Wikipedia: A case study
|Mining Domain-Specific Thesauri from Wikipedia: A case study|
|Author(s)||David Milne, Olena Medelyan, Ian H. Witten|
|Published in||ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06)|
|Keyword(s)||datamining information-retrieval semantic text-mining wikipedia|
|Article||BASE, CiteSeerX, Google Scholar|
|Web||Ask, Bing, Google (PDF), Yahoo!|
|Download and mirrors|
|Local copy||Not available|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of conference papers|
Mining Domain-Specific Thesauri from Wikipedia: A case study is a 2006 conference paper by David Milne, Olena Medelyan, Ian H. Witten and published in ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).
Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia, a vast, open encyclopedia. In a comparison with a professional thesaurus for agriculture (Agrovoc) we find that Wikipedia contains a substantial proportion of its domain-specific concepts and semantic relations; furthermore it has impressive coverage of a collection of contemporary documents in the domain. Thesauri derived using these techniques are attractive because they capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts.
- This section requires expansion. Please, help!