Question answering

From WikiPapers
(Redirected from Question Answering)
Jump to: navigation, search

question answering is included as keyword or extra keyword in 0 datasets, 0 tools and 61 publications.

Datasets

There is no datasets for this keyword.

Tools

There is no tools for this keyword.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Open domain question answering using Wikipedia-based knowledge model Ryu P.-M.
Jang M.-G.
Kim H.-K.
Information Processing and Management English 2014 This paper describes the use of Wikipedia as a rich knowledge source for a question answering (QA) system. We suggest multiple answer matching modules based on different types of semi-structured knowledge sources of Wikipedia, including article content, infoboxes, article structure, category structure, and definitions. These semi-structured knowledge sources each have their unique strengths in finding answers for specific question types, such as infoboxes for factoid questions, category structure for list questions, and definitions for descriptive questions. The answers extracted from multiple modules are merged using an answer merging strategy that reflects the specialized nature of the answer matching modules. Through an experiment, our system showed promising results, with a precision of 87.1%, a recall of 52.7%, and an F-measure of 65.6%, all of which are much higher than the results of a simple text analysis based system. © 2014 Elsevier Ltd. All rights reserved. 0 0
Semantic question answering using Wikipedia categories clustering Stratogiannis G.
Georgios Siolas
Andreas Stafylopatis
International Journal on Artificial Intelligence Tools English 2014 We describe a system that performs semantic Question Answering based on the combination of classic Information Retrieval methods with semantic ones. First, we use a search engine to gather web pages and then apply a noun phrase extractor to extract all the candidate answer entities from them. Candidate entities are ranked using a linear combination of two IR measures to pick the most relevant ones. For each one of the top ranked candidate entities we find the corresponding Wikipedia page. We then propose a novel way to exploit Semantic Information contained in the structure of Wikipedia. A vector is built for every entity from Wikipedia category names by splitting and lemmatizing the words that form them. These vectors maintain Semantic Information in the sense that we are given the ability to measure semantic closeness between the entities. Based on this, we apply an intelligent clustering method to the candidate entities and show that candidate entities in the biggest cluster are the most semantically related to the ideal answers to the query. Results on the topics of the TREC 2009 Related Entity Finding task dataset show promising performance. 0 0
A virtual player for "who Wants to Be a Millionaire?" based on Question Answering Molino P.
Pierpaolo Basile
Santoro C.
Pasquale Lops
De Gemmis M.
Giovanni Semeraro
Lecture Notes in Computer Science English 2013 This work presents a virtual player for the quiz game "Who Wants to Be a Millionaire?". The virtual player demands linguistic and common sense knowledge and adopts state-of-the-art Natural Language Processing and Question Answering technologies to answer the questions. Wikipedia articles and DBpedia triples are used as knowledge sources and the answers are ranked according to several lexical, syntactic and semantic criteria. Preliminary experiments carried out on the Italian version of the boardgame proves that the virtual player is able to challenge human players. 0 0
Filling the gaps among DBpedia multilingual chapters for question answering Cojan J.
Cabrio E.
Fabien Gandon
Proceedings of the 3rd Annual ACM Web Science Conference, WebSci 2013 English 2013 To publish information extracted from multilingual pages of Wikipedia in a structured way, the Semantic Web community has started an effort of internationalization of DBpe-dia. Multilingual chapters of DBpedia can in fact contain different information with respect to the English version, in particular they provide more specificity on certain topics, or fill information gaps. DBpedia multilingual chapters are well connected through instance interlinking, extracted from Wikipedia. An alignment between properties is also carried out by DBpedia contributors as a mapping from the terms used in Wikipedia to a common ontology, enabling the exploitation of information coming from the multilingual chapters of DBpedia. However, the mapping process is currently incomplete, it is time consuming since it is manually performed, and may lead to the introduction of redundant terms in the ontology, as it becomes difficult to navigate through the existing vocabulary. In this paper we propose an approach to automatically extend the existing alignments, and we integrate it in a question answering system over linked data. We report on experiments carried out applying the QAKiS (Question Answering wiKiframework-based) system on the English and French DBpedia chapters, and we show that the use of such approach broadens its coverage. Copyright 2013 ACM. 0 0
Related entity finding using semantic clustering based on wikipedia categories Stratogiannis G.
Georgios Siolas
Andreas Stafylopatis
Lecture Notes in Computer Science English 2013 We present a system that performs Related Entity Finding, that is, Question Answering that exploits Semantic Information from the WWW and returns URIs as answers. Our system uses a search engine to gather all candidate answer entities and then a linear combination of Information Retrieval measures to choose the most relevant. For each one we look up its Wikipedia page and construct a novel vector representation based on the tokenization of the Wikipedia category names. This novel representation gives our system the ability to compute a measure of semantic relatedness between entities, even if the entities do not share any common category. We use this property to perform a semantic clustering of the candidate entities and show that the biggest cluster contains entities that are closely related semantically and can be considered as answers to the query. Performance measured on 20 topics from the 2009 TREC Related Entity Finding task shows competitive results. 0 0
A graph-based summarization system at QA@INEX track 2011 Laureano-Cruces A.L.
Ramirez-Rodriguez J.
Lecture Notes in Computer Science English 2012 In this paper we use REG, a graph-based system to study a fundamental problem of Natural Language Processing: the automatic summarization of documents. The algorithm models a document as a graph, to obtain weighted sentences. We applied this approach to the INEX@QA 2011 task (question-answering). We have extracted the title and some key or related words according to two people from the queries, in order to recover 50 documents from english wikipedia. Using this strategy, REG obtained good results with the automatic evaluation system FRESA. 0 0
A hybrid QA system with focused IR and automatic summarization for INEX 2011 Bhaskar P.
Somnath Banerjee
Neogi S.
Bandyopadhyay S.
Lecture Notes in Computer Science English 2012 The article presents the experiments carried out as part of the participation in the QA track of INEX 2011. We have submitted two runs. The INEX QA task has two main sub tasks, Focused IR and Automatic Summarization. In the Focused IR system, we first preprocess the Wikipedia documents and then index them using Nutch. Stop words are removed from each query tweet and all the remaining tweet words are stemmed using Porter stemmer. The stemmed tweet words form the query for retrieving the most relevant document using the index. The automatic summarization system takes as input the query tweet along with the tweet's text and the title from the most relevant text document. Most relevant sentences are retrieved from the associated document based on the TF-IDF of the matching query tweet, tweet's text and title words. Each retrieved sentence is assigned a ranking score in the Automatic Summarization system. The answer passage includes the top ranked retrieved sentences with a limit of 500 words. The two unique runs differ in the way in which the relevant sentences are retrieved from the associated document. Our first run got the highest score of 432.2 in Relaxed metric of Readability evaluation among all the participants. 0 0
Building a large scale knowledge base from Chinese Wiki Encyclopedia Zhe Wang
Jing-Woei Li
Pan J.Z.
Lecture Notes in Computer Science English 2012 DBpedia has been proved to be a successful structured knowledge base, and large scale Semantic Web data has been built by using DBpedia as the central interlinking-hubs of the Web of Data in English. But in Chinese, due to the heavily imbalance in size (no more than one tenth) between English and Chinese in Wikipedia, there are few Chinese linked data are published and linked to DBpedia, which hinders the structured knowledge sharing both within Chinese resources and cross-lingual resources. This paper aims at building large scale Chinese structured knowledge base from Hudong, which is one of the largest Chinese Wiki Encyclopedia websites. In this paper, an upper-level ontology schema in Chinese is first learned based on the category system and Infobox information in Hudong. Totally, there are 19542 concepts are inferred, which are organized in hierarchy with maximally 20 levels. 2381 properties with domain and range information are learned according to the attributes in the Hudong Infoboxes. Then, 802593 instances are extracted and described using the concepts and properties in the learned ontology. These extracted instances cover a wide range of things, including persons, organizations, places and so on. Among all the instances, 62679 of them are linked to identical instances in DBpedia. Moreover, the paper provides RDF dump or SPARQL to access the established Chinese knowledge base. The general upper-level ontology and wide coverage makes the knowledge base a valuable Chinese semantic resource. It not only can be used in Chinese linked data building, the fundamental work for building multi lingual knowledge base across heterogeneous resources of different languages, but also can largely facilitate many useful applications of large-scale knowledge base such as knowledge question-answering and semantic search. 0 0
Design and Evaluation of an IR-Benchmark for SPARQL Queries with Fulltext Conditions Mishra A.
Gurajada S.
Martin Theobald
International Conference on Information and Knowledge Management, Proceedings English 2012 In this paper, we describe our goals in introducing a new, annotated benchmark collection, with which we aim to bridge the gap between the fundamentally different aspects that are involved in querying both structured and unstructured data. This semantically rich collection, captured in a unified XML format, combines components (unstructured text, semistructured infoboxes, and category structure) from 3.1 Million Wikipedia articles with highly structured RDF properties from both DBpedia and YAGO2. The new collection serves as the basis of the INEX 2012 Ad-hoc, Faceted Search, and Jeopardy retrieval tasks. With a focus on the new Jeopardy task, we particularly motivate the usage of the collection for question-answering (QA) style retrieval settings, which we also exemplify by introducing a set of 90 QA-style benchmark queries which come shipped in a SPARQL-based query format that has been extended by fulltext filter conditions. 0 0
Entity based translation language model Amit Singh WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion English 2012 Bridging the lexical gap between the user's question and the question-answer pairs in Q&A archives has been a major challenge for Q&A retrieval. State-of-the-art approaches address this issue by implicitly expanding the queries with additional words using statistical translation models. In this work we extend the lexical word based translation model to incorporate semantic concepts. We explore strategies to learn the translation probabilities between words and the concepts using the Q&A archives and Wikipedia. Experiments conducted on a large scale real data from Yahoo Answers! show that the proposed techniques are promising and need further investigation. Copyright is held by the author/owner(s). 0 0
Exploiting the Wikipedia structure in local and global classification of taxonomic relations Do Q.X.
Dan Roth
Natural Language Engineering English 2012 Determining whether two terms have an ancestor relation (e.g. Toyota Camry and car) or a sibling relation (e.g. Toyota and Honda) is an essential component of textual inference in Natural Language Processing applications such as Question Answering, Summarization, and Textual Entailment. Significant work has been done on developing knowledge sources that could support these tasks, but these resources usually suffer from low coverage, noise, and are inflexible when dealing with ambiguous and general terms that may not appear in any stationary resource, making their use as general purpose background knowledge resources difficult. In this paper, rather than building a hierarchical structure of concepts and relations, we describe an algorithmic approach that, given two terms, determines the taxonomic relation between them using a machine learning-based approach that makes use of existing resources. Moreover, we develop a global constraint-based inference process that leverages an existing knowledge base to enforce relational constraints among terms and thus improves the classifier predictions. Our experimental evaluation shows that our approach significantly outperforms other systems built upon the existing well-known knowledge sources. 0 0
Exploring the existing category hierarchy to automatically label the newly-arising topics in cQA Guangyou Zhou
Li Cai
Kang Liu
Jun Zhao
ACM International Conference Proceeding Series English 2012 This work investigates selecting concise labels for the newly-arising topics in community question answer. Previous methods of generating labels do not take the information of the existing category hierarchy into consideration. The main motivation of our paper is to utilize this information into the label generation process. We propose a general framework to address this problem. Firstly, we map the questions into Wikipedia concept sets, which are more meaningful than terms. Secondly, important concepts are identified to represent the main focus of the newly-arising topics. Thirdly, candidate labels are extracted from Wikipedia category graph. Finally, candidate labels are filtered and reranked by combination of structure information of existing category hierarchy and Wikipedia category graph. The experiments show that in our test collections, about 80% "correct" labels appear in the top ten labels recommended by our system. 0 0
Mining Wikipedia's snippets graph: First step to build a new knowledge base Wira-Alam A.
Mathiak B.
CEUR Workshop Proceedings English 2012 In this paper, we discuss the aspects of mining links and text snippets from Wikipedia as a new knowledge base. Current knowledge base, e.g. DBPedia[1], covers mainly the structured part of Wikipedia, but not the content as a whole. Acting as a complement, we focus on extracting information from the text of the articles. We extract a database of the hyperlinks between Wikipedia articles and populate them with the textual context surrounding each hyperlink. This would be useful for network analysis, e.g. to measure the influence of one topic on another, or for question-answering directly (for stating the relationship between two entities). First, we describe the technical parts related to extracting the data from Wikipedia. Second, we specify how to represent the data extracted as an extended triple through a Web service. Finally, we discuss the usage possibilities upon our expectation and also the challenges. 0 0
No noun phrase left behind: Detecting and typing unlinkable entities Lin T.
Mausam
Etzioni O.
EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference English 2012 Entity linking systems link noun-phrase mentions in text to their corresponding Wikipedia articles. However, NLP applications would gain from the ability to detect and type all entities mentioned in text, including the long tail of entities not prominent enough to have their own Wikipedia articles. In this paper we show that once the Wikipedia entities mentioned in a corpus of textual assertions are linked, this can further enable the detection and fine-grained typing of the unlinkable entities. Our proposed method for detecting un-linkable entities achieves 24% greater accuracy than a Named Entity Recognition baseline, and our method for fine-grained typing is able to propagate over 1,000 types from linked Wikipedia entities to unlinkable entities. Detection and typing of unlinkable entities can increase yield for NLP applications such as typed question answering. 0 0
Overview of the INEX 2011 question answering track (QA@INEX) SanJuan E.
Moriceau V.
Tannier X.
Bellot P.
Mothe J.
Lecture Notes in Computer Science English 2012 The INEX QA track aimed to evaluate complex question-answering tasks where answers are short texts generated from the Wikipedia by extraction of relevant short passages and aggregation into a coherent summary. In such a task, Question-answering, XML/passage retrieval and automatic summarization are combined in order to get closer to real information needs. Based on the groundwork carried out in 2009-2010 edition to determine the sub-tasks and a novel evaluation methodology, the 2011 edition experimented contextualizing tweets using a recent cleaned dump of the Wikipedia. Participants had to contextualize 132 tweets from the New York Times (NYT). Informativeness of answers has been evaluated, as well as their readability. 13 teams from 6 countries actively participated to this track. This tweet contextualization task will continue in 2012 as part of the CLEF INEX lab with same methodology and baseline but on a much wider range of tweet types. 0 0
Predicting website correctness from consensus analysis O'Hara S.
Bylander T.
Proceeding of the 2012 ACM Research in Applied Computation Symposium, RACS 2012 English 2012 Websites vary in terms of reliability. One could assume that NASA's website will be very accurate for Astronomy questions. Wikipedia is less accurate but is still more accurate than a generic Google search. In this research we ask a large number of "factoid" questions to several different search engines. We collect those responses and determine the correctness of each candidate answer. The answers are grouped by website source, and are compared to other websites to infer website correctness. Copyright 2012 ACM. 0 0
QA@INEX track 2011: Question expansion and reformulation using the REG summarization system Vivaldi J.
Da Cunha I.
Lecture Notes in Computer Science English 2012 In this paper, our strategy and results for the INEX@QA 2011 question-answering task are presented. In this task, a set of 50 documents is provided by the search engine Indri, using some queries. The initial queries are titles associated with tweets. Reformulation of these queries is carried out using terminological and named entities information. To design the queries, the full process is divided into 2 steps: a) both titles and tweets are POS tagged, and b) queries are expanded or reformulated, using: terms and named entities included in the title, terms and named entities found in the tweet related to those ones, and Wikipedia redirected terms and named entities from those ones included in the title. In our work, the automatic summarization system REG is used to summarize the 50 documents obtained with these queries. The algorithm models a document as a graph to obtain weighted sentences. A single document is generated and it is considered the answer of the query. This strategy, combining summarization and question reformulation, obtains good results regarding informativeness and readability. 0 0
SIGA, a system to manage information retrieval evaluations Costa L.
Mota C.
Diana Santos
Lecture Notes in Computer Science English 2012 This paper provides an overview of the current version of SIGA, a system that supports the organization of information retrieval (IR) evaluations. SIGA was recently used in Págico, an evaluation contest where both automatic and human participants competed to find answers to 150 topics in the Portuguese Wikipedia, and we describe its new capabilities in this context as well as provide preliminary results from Págico. 0 0
A comparative assessment of answer quality on four question answering sites Fichman P. Journal of Information Science English 2011 Question answering (Q&A) sites, where communities of volunteers answer questions, may provide faster, cheaper, and better services than traditional institutions. However, like other Web 2.0 platforms, user-created content raises concerns about information quality. At the same time, Q&A sites may provide answers of different quality because they have differen communities and technological platforms. This paper compares answer quality on four Q&A sites: Askville, WikiAnswers, Wikipedia Reference Desk, and Yahoo! Answers. Findings indicate that: (1) similar collaborative processes on these sites result in a wide range of outcomes, and significant differences in answer accuracy, completeness, and verifiability were evident; (2) answer multiplication does not always result in better information; it yields more complete and verifiable answers but does not result in higher accuracy levels; and (3) a Q&A site's popularity does not correlate with its answer quality, on all three measures. 0 0
Knowledge Base Population: Successful approaches and challenges Ji H.
Grishman R.
ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies English 2011 In this paper we give an overview of the Knowledge Base Population (KBP) track at the 2010 Text Analysis Conference. The main goal of KBP is to promote research in discovering facts about entities and augmenting a knowledge base (KB) with these facts. This is done through two tasks, Entity Linking - linking names in context to entities in the KB -and Slot Filling - adding information about an entity to the KB. A large source collection of newswire and web documents is provided from which systems are to discover information. Attributes ("slots") derived from Wikipedia infoboxes are used to create the reference KB. In this paper we provide an overview of the techniques which can serve as a basis for a good KBP system, lay out the remaining challenges by comparison with traditional Information Extraction (IE) and Question Answering (QA) tasks, and provide some suggestions to address these challenges. 0 0
Leveraging community-built knowledge for type coercion in question answering Kalyanpur A.
Murdock J.W.
Fan J.
Welty C.
Lecture Notes in Computer Science English 2011 Watson, the winner of the Jeopardy! challenge, is a state-of-the-art open-domain Question Answering system that tackles the fundamental issue of answer typing by using a novel type coercion (TyCor) framework, where candidate answers are initially produced without considering type information, and subsequent stages check whether the candidate can be coerced into the expected answer type. In this paper, we provide a high-level overview of the TyCor framework and discuss how it is integrated in Watson, focusing on and evaluating three TyCor components that leverage the community built semi-structured and structured knowledge resources - DBpedia (in conjunction with the YAGO ontology), Wikipedia Categories and Lists. These resources complement each other well in terms of precision and granularity of type information, and through links to Wikipedia, provide coverage for a large set of instances. 0 0
Leveraging wikipedia characteristics for search and candidate generation in question answering Chu-Carroll J.
Fan J.
Proceedings of the National Conference on Artificial Intelligence English 2011 Most existing Question Answering (QA) systems adopt a type-and-generate approach to candidate generation that relies on a pre-defined domain ontology. This paper describes a type independent search and candidate generation paradigm for QA that leverages Wikipedia characteristics. This approach is particularly useful for adapting QA systems to domains where reliable answer type identification and type-based answer extraction are not available. We present a three-pronged search approach motivated by relations an answer-justifying title-oriented document may have with the question/answer pair. We further show how Wikipedia metadata such as anchor texts and redirects can be utilized to effectively extract candidate answers from search results without a type ontology. Our experimental results show that our strategies obtained high binary recall in both search and candidate generation on TREC questions, a domain that has mature answer type extraction technology, as well as on Jeopardy! questions, a domain without such technology. Our high-recall search and candidate generation approach has also led to high over-all QA performance in Watson, our end-to-end system. Copyright © 2011, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
LogAnswer in question answering forums Pelzer B.
Glockner I.
Dong T.
ICAART 2011 - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence English 2011 LogAnswer is a question answering (QA) system for the German language. By providing concise answers to questions of the user, LogAnswer provides more natural access to document collections than conventional search engines do. QA forums provide online venues where human users can ask each other questions and give answers. We describe an ongoing adaptation of LogAnswer to QA forums, aiming at creating a virtual forum user who can respond intelligently and efficiently to human questions. This serves not only as a more accurate evaluation method of our system, but also as a real world use case for automated QA. The basic idea is that the QA system can disburden the human experts from answering routine questions, e.g. questions with known answer in the forum, or questions that can be answered from the Wikipedia. As a result, the users can focus on those questions that really demand human judgement or expertise. In order not to spam users, the QA system needs a good self-assessment of its answer quality. Existing QA techniques, however, are not sufficiently precision-oriented. The need to provide justified answers thus fosters research into logic-oriented QA and novel methods for answer validation. 0 0
Overview of the INEX 2010 question answering track (QA@INEX) SanJuan E.
Bellot P.
Moriceau V.
Tannier X.
Lecture Notes in Computer Science English 2011 The INEX Question Answering track (QA@INEX) aims to evaluate a complex question-answering task using the Wikipedia. The set of questions is composed of factoid, precise questions that expect short answers, as well as more complex questions that can be answered by several sentences or by an aggregation of texts from different documents. Long answers have been evaluated based on Kullback Leibler (KL) divergence between n-gram distributions. This allowed summarization systems to participate. Most of them generated a readable extract of sentences from top ranked documents by a state-of-the-art document retrieval engine. Participants also tested several methods of question disambiguation. Evaluation has been carried out on a pool of real questions from OverBlog and Yahoo! Answers. Results tend to show that the baseline-restricted focused IR system minimizes KL divergence but misses readability meanwhile summarization systems tend to use longer and stand-alone sentences thus improving readability but increasing KL divergence. 0 0
Predicting the perceived quality of online mathematics contributions from users' reputations Tausczik Y.R.
Pennebaker J.W.
Conference on Human Factors in Computing Systems - Proceedings English 2011 There are two perspectives on the role of reputation in collaborative online projects such as Wikipedia or Yahoo! Answers. One, user reputation should be minimized in order to increase the number of contributions from a wide user base. Two, user reputation should be used as a heuristic to identify and promote high quality contributions. The current study examined how offline and online reputations of contributors affect perceived quality in MathOverflow, an online community with 3470 active users. On MathOverflow, users post high-level mathematics questions and answers. Community members also rate the quality of the questions and answers. This study is unique in being able to measure offline reputation of users. Both offline and online reputations were consistently and independently related to the perceived quality of authors submissions, and there was only a moderate correlation between established offline and newly developed online reputation. Copyright 2011 ACM. 0 0
Temporal latent semantic analysis for collaboratively generated content: Preliminary results Yafang Wang
Agichtein E.
SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2011 Latent semantic analysis (LSA) has been intensively studied because of its wide application to Information Retrieval and Natural Language Processing. Yet, traditional models such as LSA only examine one (current) version of the document. However, due to the recent proliferation of collaboratively generated content such as threads in online forums, Collaborative Question Answering archives, Wikipedia, and other versioned content, the document generation process is now directly observable. In this study, we explore how this additional temporal information about the document evolution could be used to enhance the identification of latent document topics. Specifically, we propose a novel hidden-topic modeling algorithm, temporal Latent Semantic Analysis (tLSA), which elegantly extends LSA to modeling document revision history using tensor decomposition. Our experiments show that tLSA outperforms LSA on word relatedness estimation using benchmark data, and explore applications of tLSA for other tasks. 0 0
Extraction and approximation of numerical attributes from the Web Davidov D.
Rappoport A.
ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference English 2010 We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet similarity information. First, we obtain from the Web andWordNet a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement. 0 0
Function-based question classification for general QA Bu F.
Zhu X.
Hao Y.
EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference English 2010 In contrast with the booming increase of internet data, state-of-art QA (question answering) systems, otherwise, concerned data from specific domains or resources such as search engine snippets, online forums and Wikipedia in a somewhat isolated way. Users may welcome a more general QA system for its capability to answer questions of various sources, integrated from existed specialized sub-QA engines. In this framework, question classification is the primary task. However, the current paradigms of question classification were focused on some specified type of questions, i.e. factoid questions, which are inappropriate for the general QA. In this paper, we propose a new question classification paradigm, which includes a question taxonomy suitable to the general QA and a question classifier based on MLN (Markov logic network), where rule-based methods and statistical methods are unified into a single framework in a fuzzy discriminative learning approach. Experiments show that our method outperforms traditional question classification approaches. 0 0
How geographic was GikiCLEF? A GIR-critical review Diana Santos
Nuno Cardoso
Cabral L.M.
Proceedings of the 6th Workshop on Geographic Information Retrieval, GIR'10 English 2010 In this paper we draw a balance of GikiCLEF as far as its appropriateness for the evaluation of GIR systems is concerned. We measure its degree of dealing with geographic matter, and offer GIRA, the final resource, for GIR evaluation purposes. Copyright 2010 ACM. 0 0
Improving Question Answering Based on Query Expansion with Wikipedia Yajie Miao
Xin Su
Chunping Li
ICTAI English 2010 0 0
Learning Word-Class Lattices for definition and hypernym extraction Roberto Navigli
Velardi P.
ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference English 2010 Definition extraction is the task of automatically identifying definitional sentences within texts. The task has proven useful in many research areas including ontology learning, relation extraction and question answering. However, current approaches - mostly focused on lexicosyntactic patterns - suffer from both low recall and precision, as definitional sentences occur in highly variable syntactic structures. In this paper, we propose Word- Class Lattices (WCLs), a generalization of word lattices that we use to model textual definitions. Lattices are learned from a dataset of definitions from Wikipedia. Our method is applied to the task of definition and hypernym extraction and compares favorably to other pattern generalization methods proposed in the literature. 0 0
Mining Wikipedia and Yahoo! Answers for question expansion in Opinion QA Yajie Miao
Chenliang Li
Lecture Notes in Computer Science English 2010 Opinion Question Answering (Opinion QA) is still a relatively new area in QA research. The achieved methods focus on combining sentiment analysis with the traditional Question Answering methods. Few attempts have been made to expand opinion questions with external background information. In this paper, we introduce the broad-mining and deep-mining strategies. Based on these two strategies, we propose four methods to exploit Wikipedia and Yahoo! Answers for enriching representation of questions in Opinion QA. The experimental results show that the proposed expansion methods perform effectively for improving existing Opinion QA models. 0 0
Morpheus: A deep web question answering system Grant C.
George C.P.
Gumbs J.-D.
Wilson J.N.
Dobbins P.J.
IiWAS2010 - 12th International Conference on Information Integration and Web-Based Applications and Services English 2010 When users search the deep web, the essence of their search is often found in a previously answered query. The Morpheus question answering system reuses prior searches to answer similar user queries. Queries are represented in a semistructured format that contains query terms and referenced classes within a specific ontology. Morpheus answers questions by using methods from prior successful searches. The system ranks stored methods based on a similarity quasimetric defined on assigned classes of queries. Similarity depends on the class heterarchy in an ontology and its associated text corpora. Morpheus revisits the prior search pathways of the stored searches to construct possible answers. Realm-based ontologies are created using Wikipedia pages, associated categories, and the synset heterarchy of WordNet. This paper describes the entire process with emphasis on the matching of user queries to stored answering methods. Copyright 2010 ACM. 0 0
Overview of ResPubliQA 2009: Question answering evaluation over European legislation Penas A.
Forner P.
Sutcliffe R.
Rodrigo A.
Forascu C.
Iñaki Alegria
Giampiccolo D.
Moreau N.
Osenova P.
Lecture Notes in Computer Science English 2010 This paper describes the first round of ResPubliQA, a Question Answering (QA) evaluation task over European legislation, proposed at the Cross Language Evaluation Forum (CLEF) 2009. The exercise consists of extracting a relevant paragraph of text that satisfies completely the information need expressed by a natural language question. The general goals of this exercise are (i) to study if the current QA technologies tuned for newswire collections and Wikipedia can be adapted to a new domain (law in this case); (ii) to move to a more realistic scenario, considering people close to law as users, and paragraphs as system output; (iii) to compare current QA technologies with pure Information Retrieval (IR) approaches; and (iv) to introduce in QA systems the Answer Validation technologies developed in the past three years. The paper describes the task in more detail, presenting the different types of questions, the methodology for the creation of the test sets and the new evaluation measure, and analyzing the results obtained by systems and the more successful approaches. Eleven groups participated with 28 runs. In addition, we evaluated 16 baseline runs (2 per language) based only in pure IR approach, for comparison purposes. Considering accuracy, scores were generally higher than in previous QA campaigns. 0 0
Semantic QA for encyclopaedic questions: EQUAL in GikiCLEF Iustin Dornescu Lecture Notes in Computer Science English 2010 This paper presents a new question answering (QA) approach and a prototype system, EQUAL, which relies on structural information from Wikipedia to answer open-list questions. The system achieved the highest score amongst the participants in the GikiCLEF 2009 task. Unlike the standard textual QA approach, EQUAL does not rely on identifying the answer within a text snippet by using keyword retrieval. Instead, it explores the Wikipedia page graph, extracting and aggregating information from multiple documents and enforcing semantic constraints. The challenges for such an approach and an error analysis are also discussed. 0 0
Surface language models for discovering temporally anchored definitions on the web: Producing chronologies as answers to definition questions Alejandro Figueroa WEBIST 2010 - Proceedings of the 6th International Conference on Web Information Systems and Technology English 2010 This work presents a data-driven definition question answering (QA) system that outputs a set of temporally anchored definitions as answers. This system builds surface language models on top of a corpus automatically acquired from Wikipedia abstracts, and ranks answer candidates in agreement with these models afterwards. Additionally, this study deals at greater length with the impact of several surface features in the ranking of temporally anchored answers. 0 0
Top-down and bottom-up: A combined approach to slot filling Zheng Chen
Tamang S.
Lee A.
Li X.
Passantino M.
Ji H.
Lecture Notes in Computer Science English 2010 The Slot Filling task requires a system to automatically distill information from a large document collection and return answers for a query entity with specified attributes ('slots'), and use them to expand the Wikipedia infoboxes. We describe two bottom-up Information Extraction style pipelines and a top-down Question Answering style pipeline to address this task. We propose several novel approaches to enhance these pipelines, including statistical answer re-ranking and Markov Logic Networks based cross-slot reasoning. We demonstrate that our system achieves state-of-the-art performance, with 3.1% higher precision and 2.6% higher recall compared with the best system in the KBP2009 evaluation. 0 0
Are wikipedia resources useful for discovering answers to list questions within web snippets? Alejandro Figueroa Lecture Notes in Business Information Processing English 2009 This paper presents LiSnQA, a list question answering system that extracts answers to list queries from the short descriptions of web-sites returned by search engines, called web snippets. LiSnQA mines Wikipedia resources in order to obtain valuable information that assists in the extraction of these answers. The interesting facet of LiSnQA is, that in contrast to current systems, it does not account for lists in Wikipedia, but for its redirections, categories, sandboxes, and first definition sentences. Results show that these resources strengthen the answering process. 0 0
Document re-ranking via Wikipedia articles for definition/biography type questions Liu M.
Fang F.
Ji D.
PACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation English 2009 In this paper, we propose a document re-ranking approach based on the Wikipedia articles related to the specific questions to re-order the initial retrieved documents to improve the precision of top retrieved documents in Chinese information retrieval for question answering (IR4QA) system where the questions are definition or biography type. On one hand, we compute the similarity between each document in the initial retrieved results and the related Wikipedia article. On the other hand, we do clustering analysis for the documents based on the K-Means clustering method and compute the similarity between each centroid of the clusters and the Wikipedia article. Then we integrate the two kinds of similarity with the initial ranking score as the last similarity value and re-rank the documents in descending order with this measure. Experiment results demonstrate that this approach can improve the precision of the top relevant documents effectively. 0 0
Exploiting structure and content of Wikipedia for query expansion in the context of Question Answering Ganesh S.
Vasudeva Varma
International Conference Recent Advances in Natural Language Processing, RANLP English 2009 Retrieving answer containing passages is a challenging task in Question Answering. In this paper we describe a novel query expansion method which aims to rank the answer containing passages better. It uses content and structured information (link structure and category information) of Wikipedia to generate a set of terms semantically related to the question. As Boolean model allows a fine-grained control over query expansion, these semantically related terms are added to the original query to form an expanded Boolean query. We conducted experiments on TREC 2006 QA data. The experimental results show significant improvements of about 24.6%, 11.1% and 12.4% in precision at 1, MRR at 20 and TDRR scores respectively using our query expansion method. 0 0
GikiP at geoCLEF 2008: Joining GIR and QA forces for querying wikipedia Diana Santos
Nuno Cardoso
Paula Carvalho
Iustin Dornescu
Sven Hartrumpf
Johannes Leveling
Yvonne Skalban
Lecture Notes in Computer Science English 2009 This paper reports on the GikiP pilot that took place in 2008 in GeoCLEF. This pilot task requires a combination of methods from geographical information retrieval and question answering to answer queries to the Wikipedia. We start by the task description, providing details on topic choice and evaluation measures. Then we offer a brief motivation from several perspectives, and we present results in detail. A comparison of participants' approaches is then presented, and the paper concludes with improvements for the next edition. 0 0
Minimally supervised question classification and answering based on WordNet and Wikipedia Jian Chang
Yen T.-H.
Tsai R.T.-H.
Proceedings of the 21st Conference on Computational Linguistics and Speech Processing, ROCLING 2009 English 2009 In this paper, we introduce an automatic method for classifying a given question using broad semantic categories in an existing lexical database (i.e., WordNet) as the class tagset. For this, we also constructed a large scale entity supersense database that contains over 1.5 million entities to the 25 WordNet lexicographer's files (supersenses) from titles of Wikipedia entry. To show the usefulness of our work, we implement a simple redundancy-based system that takes the advantage of the large scale semantic database to perform question classification and named entity classification for open domain question answering. Experimental results show that the proposed method outperform the baseline of not using question classification. 0 0
Named entity network based on wikipedia Maskey S.
Dakka W.
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH English 2009 Named Entities (NEs) play an important role in many natural language and speech processing tasks. A resource that identifies relations between NEs could potentially be very useful. We present such automatically generated knowledge resource from Wikipedia, Named Entity Network (NE-NET), that provides a list of related Named Entities (NEs) and the degree of relation for any given NE. Unlike some manually built knowledge resource, NE-NET has a wide coverage consisting of 1.5 million NEs represented as nodes of a graph with 6.5 million arcs relating them. NE-NET also provides the ranks of the related NEs using a simple ranking function that we propose. In this paper, we present NE-NET and our experiments showing how NE-NET can be used to improve the retrieval of spoken (Broadcast News) and text documents. Copyright 0 0
Overview of the clef 2008 multilingual question answering track Forner P.
Penas A.
Eneko Agirre
Iñaki Alegria
Forascu C.
Moreau N.
Osenova P.
Prokopidis P.
Rocha P.
Sacaleanu B.
Sutcliffe R.
Tjong Kim Sang E.
Lecture Notes in Computer Science English 2009 The QA campaign at CLEF 2008 [1], was mainly the same as that proposed last year. The results and the analyses reported by last year's participants suggested that the changes introduced in the previous campaign had led to a drop in systems' performance. So for this year's competition it has been decided to practically replicate last year's exercise. Following last year's experience some QA pairs were grouped in clusters. Every cluster was characterized by a topic (not given to participants). The questions from a cluster contained co-references between one of them and the others. Moreover, as last year, the systems were given the possibility to search for answers in Wikipedia as document corpus beside the usual newswire collection. In addition to the main task, three additional exercises were offered, namely the Answer Validation Exercise (AVE), the Question Answering on Speech Transcriptions (QAST), which continued last year's successful pilots, together with the new Word Sense Disambiguation for Question Answering (QA-WSD). As general remark, it must be said that the main task still proved to be very challenging for participating systems. As a kind of shallow comparison with last year's results the best overall accuracy dropped significantly from 42% to 19% in the multi-lingual subtasks, but increased a little in the monolingual sub-tasks, going from 54% to 63%. 0 0
Socializing or knowledge sharing? Characterizing social intent in community question answering Mendes Rodrigues E.
Milic-Frayling N.
International Conference on Information and Knowledge Management, Proceedings English 2009 Knowledge sharing communities, such as Wikipedia or Yahoo! Answers, add greatly to the wealth of information available on the Web. They represent complex social ecosystems that rely on user paricipation and the quality of users' contributions to prosper. However, quality is harder to achieve when knowledge sharing is facilitated through a high degree of personal interactions. The individuals' objectives may change from knowledge sharing to socializing, with a profound impact on the community and the value it delivers to the broader population of Web users. In this paper we provide new insights into the types of content that is shared through Community Question Answering (CQA) services. We demonstrate an approach that combines in-depth content analysis with social network analysis techniques. We adapted the Undirected Inductive Coding method to analyze samples of user questions and arrive at a comprehensive typology of the user intent. In our analysis we focused on two types of intent, social vs. non-social, and define measures of social engagement to characterize the users' participation and content contributions. Our approach is applicable to a broad class of online communities and can be used to monitor the dynamics of community ecosystems. Copyright 2009 ACM. 0 0
Using answer retrieval patterns to answer portuguese questions Costa L.F. Lecture Notes in Computer Science English 2009 Esfinge is a general domain Portuguese question answering system which has been participating at QA@CLEF since 2004. It uses the information available in the "official" document collections used in QA@CLEF (newspaper text and Wikipedia) and information from the Web as an additional resource when searching for answers. Where it regards the use of external tools, Esfinge uses a syntactic analyzer, a morphological analyzer and a named entity recognizer. This year an alternative approach to retrieve answers was tested: whereas in previous years, search patterns were used to retrieve relevant documents, this year a new type of search patterns was also used to extract the answers themselves. We also evaluated the second and third best answers returned by Esfinge. This evaluation showed that when Esfinge answers correctly a question, it does so usually with its first answer. Furthermore, the experiments revealed that the answer retrieval patterns created for this participation improve the results, but only for definition questions. 0 0
World wide web based question answering system - A relevance feedback framework for automatic answer validation Ray S.K.
Sandesh Singh
Joshi B.P.
2nd International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2009 English 2009 An open domain question answering system is one of the emerging information retrieval systems available on the World Wide Web that is becoming popular day by day to get succinct and relevant answers in response of users' questions. The validation of the correctness of the answer is an important issue in the field of question answering. In this paper, we are proposing a World Wide Web based solution for answer validation where answers returned by open domain Question Answering Systems can be validated using online resources such as Wikipedia and Google. We have applied several heuristics for answer validation task and tested them against some popular World Wide Web based open domain Question Answering Systems over a collection of 500 questions collected from standard sources such as TREC, the Worldbook, and the Worldfactbook. We found that the proposed method is yielding promising results for automatic answer validation task. 0 0
A lexical approach for Spanish question answering Tellez A.
Juarez A.
Hernandez G.
Denicia C.
Villatoro E.
Montes M.
Villasenor L.
Lecture Notes in Computer Science English 2008 This paper discusses our system's results at the Spanish Question Answering task of CLEF 2007. Our system is centered in a full data-driven approach that combines information retrieval and machine learning techniques. It mainly relies on the use of lexical information and avoids any complex language processing procedure. Evaluation results indicate that this approach is very effective for answering definition questions from Wikipedia. In contrast, they also reveal that it is very difficult to respond factoid questions from this resource solely based on the use of lexical overlaps and redundancy. 0 0
Combining wikipedia and newswire texts for question answering in spanish De Pablo-Sanchez C.
Martinez-Fernandez J.L.
Gonzalez-Ledesma A.
Samy D.
Martinez P.
Moreno-Sandoval A.
Al-Jumaily H.
Lecture Notes in Computer Science English 2008 This paper describes the adaptations of the MIRACLE group QA system in order to participate in the Spanish monolingual question answering task at QA@CLEF 2007. A system, initially developed for the EFE collection, was reused for Wikipedia. Answers from both collections were combined using temporal information extracted from questions and collections. Reusing the EFE subsystem has proven not feasible, and questions with answers only in Wikipedia have obtained low accuracy. Besides, a co-reference module based on heuristics was introduced for processing topic-related questions. This module achieves good coverage in different situations but it is hindered by the moderate accuracy of the base system and the chaining of incorrect answers. 0 0
GikiP: Evaluating geographical answers from wikipedia Diana Santos
Nuno Cardoso
International Conference on Information and Knowledge Management, Proceedings English 2008 This paper describes GikiP, a pilot task that took place in 2008 in CLEF. We present the motivation behind GikiP and the use of Wikipedia as the evaluation collection, detail the task and we list new ideas for its continuation. 0 0
GikiP: evaluating geographical answers from wikipedia Diana Santos
Nuno Cardoso
GIR English 2008 0 0
Grammar-based automatic extraction of definitions Adrian Iftene
Pistol I.
Trandabat D.
Proceedings of the 2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2008 English 2008 The paper describes the development and usage of a grammar developed to extract definitions from documents. One of the most important practical usages of the developed grammar is the automatic extraction of definitions from web documents. Three evaluation scenarios were run, the results of these experiments being the main focus of the paper. One scenario uses an e-learning context and previously annotated elearning documents; the second one involves a large collection of unannotated documents (from Wikipedia) and tries to find answers for definition type questions. The third scenario performs a similar question-answering task, but this time on the entire web using Google web search and the Google Translation Service. The results are convincing, further development as well as further integration of the definition extraction system in various related applications are already under way. 0 0
Mining Wikipedia Resources for Discovering Answers to List Questions in Web Snippets Alejandro Figueroa SKG English 2008 0 0
Mining Wikipedia for Discovering Multilingual Definitions on the Web Alejandro Figueroa SKG English 2008 0 0
Named entity normalization in user generated content Jijkoun V.
Khalid M.A.
Marx M.
Maarten de Rijke
Proceedings of SIGIR 2008 Workshop on Analytics for Noisy Unstructured Text Data, AND'08 English 2008 Named entity recognition is important for semantically oriented retrieval tasks, such as question answering, entity retrieval, biomedical retrieval, trend detection, and event and entity tracking. In many of these tasks it is important to be able to accurately normalize the recognized entities, i.e., to map surface forms to unambiguous references to real world entities. Within the context of structured databases, this task (known as record linkage and data, de-duplication) has been a topic of active research for more than five decades. For edited content, such as news articles, the named entity normalization (NEN) task is one that has recently attracted considerable attention. We consider the task in the challenging context of user generated content (UGC), where it forms a key ingredient of tracking and media-analysis systems. A baseline NEN system from the literature (that normalizes surface forms to Wikipedia pages) performs considerably worse on UGC than on edited news: accuracy drops from 80% to 65% for a Dutch language data set and from 94% to 77% for English. We identify several sources of errors: entity recognition errors, multiple ways of referring to the same entity and ambiguous references. To address these issues we propose five improvements to the baseline NEN algorithm, to arrive at a language independent NEN system that achieves overall accuracy scores of 90% on the English data set and 89% on the Dutch data set. We show that each of the improvements contributes to the overall score of our improved NEN algorithm, and conclude with an error analysis on both Dutch and English language UGC. The NEN system is computationally efficient and runs with very modest computational requirements. Copyright 2008 ACM. 0 0
Overview of the CLEF 2007 multilingual question answering track Giampiccolo D.
Forner P.
Herrera J.
Penas A.
Ayache C.
Forascu C.
Jijkoun V.
Osenova P.
Rocha P.
Sacaleanu B.
Sutcliffe R.
Lecture Notes in Computer Science English 2008 The fifth QA campaign at CLEF [1], having its first edition in 2003, offered not only a main task but an Answer Validation Exercise (AVE) [2], which continued last year's pilot, and a new pilot: the Question Answering on Speech Transcripts (QAST) [3, 15]. The main task was characterized by the focus on cross-linguality, while covering as many European languages as possible. As novelty, some QA pairs were grouped in clusters. Every cluster was characterized by a topic (not given to participants). The questions from a cluster possibly contain co-references between one of them and the others. Finally, the need for searching answers in web formats was satisfied by introducing Wikipedia as document corpus. The results and the analyses reported by the participants suggest that the introduction of Wikipedia and the topic related questions led to a drop in systems' performance. 0 0
Priberam's question answering system in QA@CLEF 2007 Amaral C.
Cassan A.
Figueira H.
Martins A.
Mendes A.
Mendes P.
Pinto C.
Vidal D.
Lecture Notes in Computer Science English 2008 This paper accounts for Priberam's participation in the monolingual question answering (QA) track of CLEF 2007. In previous participations, Priberam's QA system obtained encouraging results both in monolingual and cross-language tasks. This year we endowed the system with syntactical processing, in order to capture the syntactic structure of the question. The main goal was to obtain a more tuned question categorisation and consequently a more precise answer extraction. Besides this, we provided our system with the ability to handle topic-related questions and to use encyclopaedic sources like Wikipedia. The paper provides a description of the improvements made in the system, followed by the discussion of the results obtained in Portuguese and Spanish monolingual runs. 0 0
Question answering with joost at CLEF 2007 Gosse Bouma
Kloosterman G.
Mur J.
Van Noord G.
Van Der Plas L.
Tiedemann J.
Lecture Notes in Computer Science English 2008 We describe our system for the monolingual Dutch and multilingual English to Dutch QA tasks. We describe the preprocessing of Wikipedia, inclusion of query expansion in IR, anaphora resolution in follow-up questions, and a question classification module for the multilingual task. Our best runs achieved 25.5% accuracy for the Dutch monolingual task, and 13.5% accuracy for the multilingual task. 0 0
The university of amsterdam's question answering system at QA@CLEF 2007 Jijkoun V.
Hofmann K.
Ahn D.
Khalid M.A.
Van Rantwijk J.
Maarten de Rijke
Tjong Kim Sang E.
Lecture Notes in Computer Science English 2008 We describe a new version of our question answering system, which was applied to the questions of the 2007 CLEF Question Answering Dutch monolingual task. This year, we made three major modifications to the system: (1) we added the contents of Wikipedia to the document collection and the answer tables; (2) we completely rewrote the module interface code in Java; and (3) we included a new table stream which returned answer candidates based on information which was learned from question-answer pairs. Unfortunately, the changes did not lead to improved performance. Unsolved technical problems at the time of the deadline have led to missing justifications for a large number of answers in our submission. Our single run obtained an accuracy of only 8% with an additional 12% of unsupported answers (compared to 21% in the last year's task). 0 0
What Happened to Esfinge in 2007? Cabral L.M.
Costa L.F.
Diana Santos
Lecture Notes in Computer Science English 2008 Esfinge is a general domain Portuguese question answering system which uses the information available on the Web as an additional resource when searching for answers. Other external resources and tools used are a broad coverage parser, a morphological analyser, a named entity recognizer and a Web-based database of word co-occurrences. In this fourth participation in CLEF, in addition to the new challenges posed by the organization (topics and anaphors in questions and the use of Wikipedia to search and support answers), we experimented with a multiple question and multiple answer approach in QA. 0 0
Question answering with QED at TREC-2005 Ahn K.
Bos J.
Curran J.R.
Kor D.
Nissim M.
Webber B.
NIST Special Publication English 2005 This report describes the system developed by the University of Edinburgh and the University of Sydney for the TREC-2005 question answering evaluation exercise. The backbone of our question-answering platform is QED, a linguistically-principled QA system. We experimented with external sources of knowledge, such as Google and Wikipedia, to enhance the performance of QED, especially for reranking and off-line processing of the corpus. For factoid and list questions we performed significantly above the median accuracy score of all participating systems at TREC 2005. 0 0