Jugal Kalita

From WikiPapers
Jump to: navigation, search

Jugal Kalita is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Extracting and displaying temporal and geospatial entities from articles on historical events Geospatial entity extraction
Information extraction
Natural Language Processing
Temporal extraction
Computer Journal English 2014 This paper discusses a system that extracts and displays temporal and geospatial entities in text. The first task involves identification of all events in a document followed by identification of important events using a classifier. The second task involves identifying named entities associated with the document. In particular, we extract geospatial named entities. We disambiguate the set of geospatial named entities and geocode them to determine the correct coordinates for each place name, often called grounding. We resolve ambiguity based on sentence and article context. Finally, we present a user with the key events and their associated people, places and organizations within a document in terms of a timeline and a map. For purposes of testing, we use Wikipedia articles about historical events, such as those describing wars, battles and invasions. We focus on extracting major events from the articles, although our ideas and tools can be easily used with articles from other sources such as news articles. We use several existing tools such as Evita, Google Maps, publicly available implementations of Support Vector Machines, Hidden Markov Model and Conditional Random Field, and the MIT SIMILE Timeline. 0 0
A comparison of approaches for geospatial entity extraction from Wikipedia Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010 English 2010 We target in this paper the challenge of extracting geospatial data from the article text of the English Wikipedia. We present the results of a Hidden Markov Model (HMM) based approach to identify location-related named entities in the our corpus of Wikipedia articles, which are primarily about battles and wars due to their high geospatial content. The HMM NER process drives a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name (often referred to as grounding). We compare our results to a previously developed data structure and algorithm for disambiguating place names that can have multiple coordinates. We demonstrate an overall f-measure of 79.63% identifying and geocoding place names. Finally, we compare the results of the HMM-driven process to earlier work using a Support Vector Machine. 0 0
Extracting Geospatial Entities from Wikipedia Geospatial extraction
Wikipedia extraction
Location extraction
NER
Geospatial entity recognition
ICSC English 2009 0 0
Extracting geospatial entities from Wikipedia ICSC 2009 - 2009 IEEE International Conference on Semantic Computing English 2009 This paper addresses the challenge of extracting geospatial data from the article text of the English Wikipedia. In the first phase of our work, we create a training corpus and select a set of word-based features to train a Support Vector Machine (SVM) for the task of geospatial named entity recognition. We target for testing a corpus of Wikipedia articles about battles and wars, as these have a high incidence of geospatial content. The SVM recognizes place names in the corpus with a very high recall, close to 100%, with an acceptable precision. The set of geospatial NEs is then fed into a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name. As many place names are ambiguous, and do not immediately geocode to a single location, we present a data structure and algorithm to resolve ambiguity based on sentence and article context, so the correct coordinates can be selected. We achieve an f-measure of 82%, and create a set of geospatial entities for each article, combining the place names, spatial locations, and an assumed point geometry. These entities can enable geospatial search on and geovisualization of Wikipedia. 0 0
Mining wikipedia article clusters for geospatial entities and relationships AAAI Spring Symposium - Technical Report English 2009 We present in this paper a method to extract geospatial entities and relationships from the unstructured text of the English language Wikipedia. Using a novel approach that applies SVMs trained from purely structural features of text strings, we extract candidate geospatial entities and relation-ships. Using a combination of further techniques, along with an external gazetteer, the candidate entities and relationships are disambiguated and the Wikipedia article pages are modified to include the semantic information provided by the extraction process. We successfully extracted location entities with an F-measure of 81 %, and location relations with an F-measure of 54%. Copyright 0 0