Last modified on November 9, 2014, at 10:59

Thesaurus

Thesaurus is included as keyword or extra keyword in 0 datasets, 0 tools and 12 publications.

Datasets

There is no datasets for this keyword.

Tools

There is no tools for this keyword.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Monitoring propagations in the blogosphere for viral marketing Meihui Chen
Rubens N.
Anma F.
Okamoto T.
Journal of Emerging Technologies in Web Intelligence English 2012 Even though blog contents vary a lot in quality, the disclosure of personal opinions and the huge blogging population always attracts marketing..s attention on blog information. In this paper, we investigate how marketers can identify the information propagation in degree among blog communities. In this way, topic similarity, relatedness, and word repetition between leader and followers.. writing products are considered as the propagated information. The contribution of this paper is twofold. The work presented here is to show how blog content can be economically and feasibly analyzed by existing internet sources such as Wikipedia database and the usage of page return from a Japanese search engine. To this extent, this system, which combined in-link algorithms and text mining analyzes, tracing propagation channels and propagateable information allows analyzing the power of influences in viral marketing. We demonstrated the effectiveness of the system by applying blogger identification, topic identification, and the topic propagations. 0 0
Segmentation of review texts by using thesaurus and corpus-based word similarity Yu Suzuki
Fukumoto F.
KEOD 2012 - Proceedings of the International Conference on Knowledge Engineering and Ontology Development English 2012 Recently, we can refer to user reviews in the shopping or hotel reservation sites. However, with the exponential growth of information of the Internet, it is becoming increasingly difficult for a user to read and understand all the materials from a large-scale reviews that is potentially of interest. In this paper, we propose a method for review texts segmentation by guest's criteria, such as service, location and facilities. Our system firstly extracts words which represent criteria from hotel review texts. We focused on topic markers such as "ha" in Japanese to extract guest's criteria. The extracted words are classified into classes with similar words. The classification is proceeded by using Japanese WordNet. Then, for each hotel, each text with all of the guest reviews is segmented into word sequence by using criteria classes. Review text segmentation is difficult because of short text. We thus used Japanese WordNet, extracted similar word pairs, and indexes of Wikipedia. We performed text segmentation of hotel review. The results showed the effectiveness of our method and indicated that it can be used for review summarization by guest's criteria. 0 0
Study of ontology or thesaurus based document clustering and information retrieval Bharathi G.
Venkatesan D.
Journal of Theoretical and Applied Information Technology
Journal of Engineering and Applied Sciences
English 2012 Document clustering generates clusters from the whole document collection automatically and is used in many fields, including data mining and information retrieval. Clustering text data faces a number of new challenges. Among others, the volume of text data, dimensionality, sparsity and complex semantics are the most important ones. These characteristics of text data require clustering techniques to be scalable to large and high dimensional data, and able to handle sparsity and semantics. In the traditional vector space model, the unique words occurring in the document set are used as the features. But because of the synonym problem and the polysemous problem, such a bag of original words cannot represent the content of a document precisely. Most of the existing text clustering methods use clustering techniques which depend only on term strength and document frequency where single terms are used as features for representing the documents and they are treated independently which can be easily applied to non-ontological clustering. To overcome the above issues, this paper makes a survey of recent research done on ontology or thesaurus based document clustering.
Document clustering generate clusters from the whole document collection automatically and is used in many fields including data mining and information retrieval. Clustering text data faces a number of new challenges. Among others, the volume of text data, dimensionality, sparsity and complex semantics are the most impodant ones. These characteristics of text data require clustering techmques to be scalable to large and hgh dimensional data and able to handle sparsity and semantics. In the traditional vector space model, the unique words occurring in the document set are used as the features. But because of the synonym problem and the polysemous problem such a bag of original words cannot represent the content of a document precisely. Most of the existing text clustering methods use clustering techniques whch depend only on term strength and document frequency where single terms are used as features for representing the documents and they are treated independently whch can be easily applied to non-ontological clustering. To overcome these issues, this study makes a survey of recent research done on ontology or thesaurus based document clustering.
0 0
Searching the Web for Peculiar Images based on hand-made concept hierarchies Hattori S. Proceedings of the 2011 7th International Conference on Next Generation Web Services Practices, NWeSP 2011 English 2011 Most researches on Image Retrieval (IR) have aimed at clearing away noisy images and allowing users to search only acceptable images for a target object specified by its object-name. We have become able to get enough acceptable images of a target object just by submitting its object-name to a conventional keyword-based Web image search engine. However, because the search results rarely include its uncommon images, we can often get only its common images and cannot easily get exhaustive knowledge about its appearance (look and feel). As next steps of IR, it is very important to discriminate between "Typical Images" and "Peculiar Images" in the acceptable images, and moreover, to collect many different kinds of peculiar images exhaustively. This paper proposes a method to search the Web for peculiar images by expanding or modifying a target object-name (as an original query) with its hyponyms based on hand-made concept hierarchies such as WordNet and Wikipedia. 0 0
Using thesaurus to improve multiclass text classification Maghsoodi N.
Homayounpour M.M.
Lecture Notes in Computer Science English 2011 With the growing amount of textual information available on the Internet, the importance of automatic text classification has been increasing in the last decade. In this paper, a system was presented for the classification of multi-class Farsi documents which uses Support Vector Machine (SVM) classifier. The new idea proposed in the present paper, is based on extending the feature vector by adding some words extracted from a thesaurus. The goal is to assist classifier when training dataset is not comprehensive for some categories. For corpus preparation, Farsi Wikipedia website and articles of some archived newspapers and magazines are used. As the results indicate, classification efficiency improves by applying this approach. 0.89 micro F-measure were achieved for classification of 10 categories of Farsi texts. 0 0
PoolParty: SKOS thesaurus management utilizing linked data Schandl T.
Blumauer A.
Lecture Notes in Computer Science English 2010 Building and maintaining thesauri are complex and laborious tasks. PoolParty is a Thesaurus Management Tool (TMT) for the Semantic Web, which aims to support the creation and maintenance of thesauri by utilizing Linked Open Data (LOD), text-analysis and easy-to-use GUIs, so thesauri can be managed and utilized by domain experts without needing knowledge about the semantic web. Some aspects of thesaurus management, like the editing of labels, can be done via a wiki-style interface, allowing for lowest possible access barriers to contribution. PoolParty can analyse documents in order to glean new concepts for a thesaurus. Additionally a thesaurus can be enriched by retrieving relevant information from Linked Data sources and thesauri can be imported and updated via LOD URIs from external systems and also can be published as new linked data sources on the semantic web. 0 0
Query processing for enterprise search with Wikipedia link structure Sharma N.
Vasudeva Varma
KDIR 2010 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval English 2010 We present a phrase based query expansion (QE) technique for enterprise search using a domain independent concept thesaurus constructed from Wikipedia link structure. Our approach analyzes article and category link information for deriving sets of related concepts for building up the thesaurus. In addition, we build a vocabulary set containing natural word order and usage which semantically represent concepts. We extract query-representational concepts from vocabulary set with a three layered approach. Concept Thesaurus then yields related concepts for expanding a query. Evaluation on TRECENT 2007 data shows an impressive 9 percent increase in recall for fifty queries. In addition to we also observed that our implementation improves precision at top k results by 0.7, 1, 6 and 9 percent for top 10, top 20, top 50 and top 100 search results respectively, thus demonstrating the promise that Wikipedia based thesaurus holds in domain specific search. 0 0
Relation extraction between related concepts by combining Wikipedia and web information for Japanese language Masumi Shirakawa
Kotaro Nakayama
Eiji Aramaki
Takahiro Hara
Shojiro Nishio
Lecture Notes in Computer Science English 2010 Construction of a huge scale ontology covering many named entities, domain-specific terms and relations among these concepts is one of the essential technologies in the next generation Web based on semantics. Recently, a number of studies have proposed automated ontology construction methods using the wide coverage of concepts in Wikipedia. However, since they tried to extract formal relations such as is-a and a-part-of relations, generated ontologies have only a narrow coverage of the relations among concepts. In this work, we aim at automated ontology construction with a wide coverage of both concepts and these relations by combining information on the Web with Wikipedia. We propose a relation extraction method which receives pairs of co-related concepts from an association thesaurus extracted from Wikipedia and extracts their relations from the Web. 0 0
WikiPics: Multilingual image search based on wiki-mining Daniel Kinzler WikiSym 2010 English 2010 This demonstration introduces WikiPics, a language-independent image search engine for Wikimedia Commons. Based on the multilingual thesaurus provided by WikiWord, WikiPics allows users to search and navigate Wikimedia Commons in their preferred language, even though images on Commons are annotated in English nearly exclusively. 0 0
WikiPics: multilingual image search based on Wiki-mining Daniel Kinzler WikiSym English 2010 0 0
Using Wikipedia knowledge to improve text classification Pu Wang
Jian Hu
Hua-Jun Zeng
Zheng Chen
Knowl. Inf. Syst. English 2009 Text classification has been widely used to assist users with the discovery of useful information from the Internet. However, traditional classification methods are based on the {œBag} of Words? {(BOW)} representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this problem, previous work attempted to enrich text representation by means of manual intervention or automatic document expansion. The achieved improvement is unfortunately very limited, due to the poor coverage capability of the dictionary, and to the ineffectiveness of term expansion. In this paper, we automatically construct a thesaurus of concepts from Wikipedia. We then introduce a unified framework to expand the {BOW} representation with semantic relations (synonymy, hyponymy, and associative relations), and demonstrate its efficacy in enhancing previous approaches for text classification. Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm. 0 0
Gazetiki: Automatic creation of a geographical gazetteer Adrian Popescu
Gregory Grefenstette
Moellic P.-A.
Proceedings of the ACM International Conference on Digital Libraries English 2008 Geolocalized databases are becoming necessary in a wide variety of application domains. Thus far, the creation of such databases has been a costly, manual process. This drawback has stimulated interest in automating their construction, for example, by mining geographical information from the Web. Here we present and evaluate a new automated technique for creating and enriching a geographical gazetteer, called Gazetiki. Our technique merges disparate information from Wikipedia, Panoramio, and web search, engines in order to identify geographical names, categorize these names, find their geographical coordinates and rank them. The information produced in Gazetiki enhances and complements the Geonames database, using a similar domain model. We show that our method provides a richer structure and an improved coverage compared to another known attempt at automatically building a geographic database and, where possible, we compare our Gazetiki to Geonames. Copyright 2008 ACM. 0 0