Browse wiki

Jump to: navigation, search
Densification: Semantic document analysis using Wikipedia
Abstract This paper proposes a new method for semanThis paper proposes a new method for semantic document analysis: densification, which identifies and ranks Wikipedia pages relevant to a given document. Although there are similarities with established tasks such as wikification and entity linking, the method does not aim for strict disambiguation of named entity mentions. Instead, densification uses existing links to rank additional articles that are relevant to the document, a form of explicit semantic indexing that enables higher-level semantic retrieval procedures that can be beneficial for a wide range of NLP applications. Because a gold standard for densification evaluation does not exist, a study is carried out to investigate the level of agreement achievable by humans, which questions the feasibility of creating an annotated data set. As a result, a semi-supervised approach is employed to develop a two-stage densification system: filtering unlikely candidate links and then ranking the remaining links. In a first evaluation experiment, Wikipedia articles are used to automatically estimate the performance in terms of recall. Results show that the proposed densification approach outperforms several wikification systems. A second experiment measures the impact of integrating the links predicted by the densification system into a semantic question answering (QA) system that relies on Wikipedia links to answer complex questions. Densification enables the QA system to find twice as many additional answers than when using a state-of-the-art wikification system. Copyright-of-the-art wikification system. Copyright
Abstractsub This paper proposes a new method for semanThis paper proposes a new method for semantic document analysis: densification, which identifies and ranks Wikipedia pages relevant to a given document. Although there are similarities with established tasks such as wikification and entity linking, the method does not aim for strict disambiguation of named entity mentions. Instead, densification uses existing links to rank additional articles that are relevant to the document, a form of explicit semantic indexing that enables higher-level semantic retrieval procedures that can be beneficial for a wide range of NLP applications. Because a gold standard for densification evaluation does not exist, a study is carried out to investigate the level of agreement achievable by humans, which questions the feasibility of creating an annotated data set. As a result, a semi-supervised approach is employed to develop a two-stage densification system: filtering unlikely candidate links and then ranking the remaining links. In a first evaluation experiment, Wikipedia articles are used to automatically estimate the performance in terms of recall. Results show that the proposed densification approach outperforms several wikification systems. A second experiment measures the impact of integrating the links predicted by the densification system into a semantic question answering (QA) system that relies on Wikipedia links to answer complex questions. Densification enables the QA system to find twice as many additional answers than when using a state-of-the-art wikification system. Copyright-of-the-art wikification system. Copyright
Bibtextype article  +
Doi 10.1017/S1351324913000296  +
Has author Iustin Dornescu + , Orasan C. +
Has extra keyword Experiments + , Natural language processing systems + , Semantics + , World Wide Web + , Complex questions + , Document analysis + , Evaluation experiments + , Explicit semantics + , Question answering systems + , Semantic retrieval + , Semi-supervised + , Wikipedia articles + , Densification +
Issn 13513249  +
Issue 4  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 469–500  +
Published in Natural Language Engineering +
Title Densification: Semantic document analysis using Wikipedia +
Type journal article  +
Volume 20  +
Year 2014 +
Creation dateThis property is a special property in this wiki. 6 November 2014 13:08:38  +
Categories Publications without keywords parameter  + , Publications without license parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Journal articles  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 6 November 2014 13:08:38  +
DateThis property is a special property in this wiki. 2014  +
hide properties that link here 
Densification: Semantic document analysis using Wikipedia + Title
 

 

Enter the name of the page to start browsing from.