Ian H. Witten
| Ian H. Witten|
(Alternative names for this author)
|Co-authors||Anna Huang, Catherine Legg, David M. Nichols, David N. Milne, Eibe Frank, Olena Medelyan|
|Authorship||Publications (11), datasets (0), tools (0)|
|Citations||Total (9), average (0.818181818182), median (0), max (4), min (0)|
|DBLP · Google Scholar|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of authors|
Ian H. Witten is an author.
PublicationsOnly those publications related to wikis are shown here.
|Title||Keyword(s)||Published in||Language||DateThis property is a special property in this wiki.||Abstract||R||C|
|A link-based visual search engine for Wikipedia||Exploratory search
|Exploring Wikipedia with HMpara||Exploratory search
|Semantic document processing using Wikipedia as a knowledge base||INEX||English||2010||0||0|
|Wikipedia and how to use it for semantic document representation||RSKT||English||2010||0||0|
|Clustering Documents Using a Wikipedia-Based Concept Representation||English||2009||This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation by mapping the terms and phrases within documents to their corresponding articles (or concepts) in Wikipedia. We also developed a similarity measure that evaluates the semantic relatedness between concept sets for two documents. We test the concept-based representation and the similarity measure on two standard text document datasets. Empirical results show that although further optimizations could be performed, our approach already improves upon related techniques.||0||0|
|Mining meaning from Wikipedia||Information extraction
Natural Language Processing
|Int. J. Hum.-Comput. Stud.||English||2009||0||4|
|Clustering Documents with Active Learning Using Wikipedia||Data Mining, IEEE International Conference on||English||2008||Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper we propose to exploit the semantic knowledge in Wikipedia for clustering, enabling the automatic grouping of documents with similar themes. Although clustering is intrinsically unsupervised, recent research has shown that incorporating supervision improves clustering performance, even when limited supervision is provided. The approach presented in this paper applies supervision using active learning. We first utilize Wikipedia to create a concept-based representation of a text document, with each concept associated to a Wikipedia article. We then exploit the semantic relatedness between Wikipedia concepts to find pair-wise instance-level constraints for supervised clustering, guiding clustering towards the direction indicated by the constraints. We test our approach on three standard text document datasets. Empirical results show that our basic document representation strategy yields comparable performance to previous attempts; and adding constraints improves clustering performance further by up to 20%.||0||0|
|Learning to link with Wikipedia||English||2008||This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles. The resulting link detector and disambiguator performs very well, with recall and precision of almost 75%. This performance is constant whether the system is evaluated on Wikipedia articles or "real world" documents.
This work has implications far beyond enriching documents with explanatory links. It can provide structured knowledge about anyunstructured fragment of text. Any task that is currently addressed with bags of words—indexing, clustering, retrieval, and summarization to name a few—could use the techniques described here to draw on a vast network of concepts and semantics.
|Topic Indexing with Wikipedia||WikiAI||English||2008||0||2|
|A knowledge-based search engine powered by Wikipedia||English||2007||This paper describes Koru, a new search interface that offers effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it to (a) expand queries automatically and (b) help guide the user as they evolve their queries interactively. Its understanding is mined from the vast investment of manual effort and judgment that is Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We conducted a detailed user study with 12 participants and 10 topics from the 2005 TREC HARD track, and found that Koru and its underlying knowledge base offers significant advantages over traditional keyword search. It was capable of lending assistance to almost every query issued to it; making their entry more efficient, improving the relevance of the documents they return, and narrowing the gap between expert and novice seekers.||0||1|
|Mining Domain-Specific Thesauri from Wikipedia: A Case Study||English||2006||0||0|