Vasudeva Varma

From WikiPapers
Jump to: navigation, search

Vasudeva Varma is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Named entity recognition an aid to improve multilingual entity filling in language-independent approach Lan- guage Independent
Named Entity
Named Entity Recognition
NE
NER
Wikipedia
International Conference on Information and Knowledge Management, Proceedings English 2012 This paper details the approach to identify Named Enti- ties (NEs) from a large non-English corpus and associate them with appropriate tags, requiring minimal human in- tervention and no linguistic expertise. The main objective in this paper is to focus on Indian languages like Telugu, Hindi, Tamil, Marathi, etc., which are considered to be resource-poor languages when compared to English. The inherent structure of Wikipedia was exploited in develop- ing an effcient co-occurrence frequency based NE identification algorithm for Indian Languages. We describe the methods by which English Wikipedia data can be used to bootstrap the identification of NEs in other languages which generates a list of NE's. Later, the paper focuses on uti- lizing this NE list to improve multilingual Entity Filling which showed promising results. On a dataset of 2,622 Marathi Wikipedia articles, with around 10,000 NEs man- ually tagged, an F-Measure of 81.25% was achieved by our system without availing language expertise. Similarly, an F-measure of 80.42% was achieved on around 12,000 NEs tagged within 2,935 Hindi Wikipedia articles. Copyright 2012 ACM. 0 0
Using wikipedia anchor text and weighted clustering coefficient to enhance the traditional multi-document summarization Multi-document summarization
Page rank
Sentence clusters
Weighted clustering coefficient
Wikipedia anchor text
Lecture Notes in Computer Science English 2012 Similar to the traditional approach, we consider the task of summarization as selection of top ranked sentences from ranked sentence-clusters. To achieve this goal, we rank the sentence clusters by using the importance of words calculated by using page rank algorithm on reverse directed word graph of sentences. Next, to rank the sentences in every cluster we introduce the use of weighted clustering coefficient. We use page rank score of words for calculation of weighted clustering coefficient. Finally the most important issue is the presence of a lot of noisy entries in the text, which downgrades the performance of most of the text mining algorithms. To solve this problem, we introduce the use of Wikipedia anchor text based phrase mapping scheme. Our experimental results on DUC-2002 and DUC-2004 dataset show that our system performs better than unsupervised systems and better than/comparable with novel supervised systems of this area. 0 0
Effectively mining wikipedia for clustering multilingual documents Document representation
Multilingual document clustering
Wikipedia
NLDB English 2011 0 0
Language independent identification of parallel sentences using Wikipedia Wikipedia
Language independent
Parallel sentences
World Wide Web English 2011 0 0
Language-independent context aware query translation using Wikipedia BUCC English 2011 0 0
Multilingual document clustering using wikipedia as external knowledge Document representation
Multilingual document clustering
Wikipedia
IRFC English 2011 0 0
Ranking multilingual documents using minimal language dependent resources Feature Engineering
Levenshtein Edit Distance
Multilingual Document Ranking
Wikipedia
Lecture Notes in Computer Science English 2011 This paper proposes an approach of extracting simple and effective features that enhances multilingual document ranking (MLDR). There is limited prior research on capturing the concept of multilingual document similarity in determining the ranking of documents. However, the literature available has worked heavily with language specific tools, making them hard to reimplement for other languages. Our approach extracts various multilingual and monolingual similarity features using a basic language resource (bilingual dictionary). No language-specific tools are used, hence making this approach extensible for other languages. We used the datasets provided by Forum for Information Retrieval Evaluation (FIRE) for their 2010 Adhoc Cross-Lingual document retrieval task on Indian languages. Experiments have been performed with different ranking algorithms and their results are compared. The results obtained showcase the effectiveness of the features considered in enhancing multilingual document ranking. 0 0
Exploiting n-gram importance and wikipedia based additional knowledge for improvements in GAAC based document clustering Community detection
Document clustering
Group-average agglomerative clustering
N-gram
Similarity measure
Wikipedia based additional knowledge
KDIR 2010 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval English 2010 This paper provides a solution to the issue: "How can we use Wikipedia based concepts in document clustering with lesser human involvement, accompanied by effective improvements in result?" In the devised system, we propose a method to exploit the importance of N-grams in a document and use Wikipedia based additional knowledge for GAAC based document clustering. The importance of N-grams in a document depends on a many features including, but not limited to: frequency, position of their occurrence in a sentence and the position of the sentence in which they occur, in the document. First, we introduce a new similarity measure, which takes the weighted N-gram importance into account, in the calculation of similarity measure while performing document clustering. As a result, the chances of topical similarity in clustering are improved. Second, we use Wikipedia as an additional knowledge base both, to remove noisy entries from the extracted N-grams and to reduce the information gap between N-grams that are conceptually-related, which do not have a match owing to differences in writing scheme or strategies. Our experimental results on the publicly available text dataset clearly show that our devised system has a significant improvement in performance over bag-of-words based state-of-the-art systems in this area. 0 0
Query processing for enterprise search with Wikipedia link structure Enterprise search
Information retrieval
Query expansion
Thesaurus
Wikipedia link graph
KDIR 2010 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval English 2010 We present a phrase based query expansion (QE) technique for enterprise search using a domain independent concept thesaurus constructed from Wikipedia link structure. Our approach analyzes article and category link information for deriving sets of related concepts for building up the thesaurus. In addition, we build a vocabulary set containing natural word order and usage which semantically represent concepts. We extract query-representational concepts from vocabulary set with a three layered approach. Concept Thesaurus then yields related concepts for expanding a query. Evaluation on TRECENT 2007 data shows an impressive 9 percent increase in recall for fifty queries. In addition to we also observed that our implementation improves precision at top k results by 0.7, 1, 6 and 9 percent for top 10, top 20, top 50 and top 100 search results respectively, thus demonstrating the promise that Wikipedia based thesaurus holds in domain specific search. 0 0
Building a semantic virtual museum: From wiki to semantic wiki using named entity recognition Information extraction
Ontology
Semantic wiki
Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA English 2009 In this paper, we describe an approach for creating semantic wiki pages from regular wiki pages, in the domain of scientific museums, using information extraction methods in general and named entity recognition in particular. We make use of a domain specific ontology called CIDOC-CRM as a base structure for representing and processing knowledge. We have described major components of the proposed approach and a three-step process involving name entity recognition, identifying domain classes using the ontology and establishing the properties for the entities in order to generate semantic wiki pages. Our initial evaluation of the prototype shows promising results in terms of enhanced efficiency and time and cost benefits. 0 0
Exploiting structure and content of Wikipedia for query expansion in the context of Question Answering International Conference Recent Advances in Natural Language Processing, RANLP English 2009 Retrieving answer containing passages is a challenging task in Question Answering. In this paper we describe a novel query expansion method which aims to rank the answer containing passages better. It uses content and structured information (link structure and category information) of Wikipedia to generate a set of terms semantically related to the question. As Boolean model allows a fine-grained control over query expansion, these semantically related terms are added to the original query to form an expanded Boolean query. We conducted experiments on TREC 2006 QA data. The experimental results show significant improvements of about 24.6%, 11.1% and 12.4% in precision at 1, MRR at 20 and TDRR scores respectively using our query expansion method. 0 0