Browse wiki

Jump to: navigation, search
Keyword extraction and headline generation using novel word features
Abstract We introduce several novel word features fWe introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generat e a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the query to find articles in the Wikipedia corpus that are closely related to the contents of the document. With the Wikipedia search result article set, we extract the inlink, outlink, category and infobox information in each article to derive a set of novel word features which reflect the document's background knowledge. These newly introduced word features of fer valuable indications on individual words' importance in the input document. They serve as nice complements to the traditional word features derivable from explicit information of a document. In addition, we also introduce a word-document fitness feat ure to characterize the influence of a document's genre on the keyword extraction and headline generation process. We study the effectiveness of these novel word features for keyword extraction and headline generation by experiments and have obtained very encouraging results. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.gence (www.aaai.org). All rights reserved.
Abstractsub We introduce several novel word features fWe introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generat e a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the query to find articles in the Wikipedia corpus that are closely related to the contents of the document. With the Wikipedia search result article set, we extract the inlink, outlink, category and infobox information in each article to derive a set of novel word features which reflect the document's background knowledge. These newly introduced word features of fer valuable indications on individual words' importance in the input document. They serve as nice complements to the traditional word features derivable from explicit information of a document. In addition, we also introduce a word-document fitness feat ure to characterize the influence of a document's genre on the keyword extraction and headline generation process. We study the effectiveness of these novel word features for keyword extraction and headline generation by experiments and have obtained very encouraging results. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.gence (www.aaai.org). All rights reserved.
Bibtextype inproceedings  +
Has author Xu S. + , Yang S. + , Lau F.C.M. +
Has extra keyword Background knowledge + , Explicit information + , Headline generation + , Keyword extraction + , Search results + , Wikipedia + , Artificial intelligence + , Feature extraction +
Isbn 9781577354666  +
Language English +
Number of citations by publication 0  +
Number of references by publication 0  +
Pages 1461–1466  +
Published in Proceedings of the National Conference on Artificial Intelligence +
Title Keyword extraction and headline generation using novel word features +
Type conference paper  +
Volume 3  +
Year 2010 +
Creation dateThis property is a special property in this wiki. 8 November 2014 01:40:11  +
Categories Publications without keywords parameter  + , Publications without license parameter  + , Publications without DOI parameter  + , Publications without remote mirror parameter  + , Publications without archive mirror parameter  + , Publications without paywall mirror parameter  + , Conference papers  + , Publications without references parameter  + , Publications  +
Modification dateThis property is a special property in this wiki. 8 November 2014 01:40:11  +
DateThis property is a special property in this wiki. 2010  +
hide properties that link here 
Keyword extraction and headline generation using novel word features + Title
 

 

Enter the name of the page to start browsing from.