Yirong Chen

From WikiPapers
Jump to: navigation, search

Yirong Chen is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Building sentiment lexicons for all major languages 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference English 2014 Sentiment analysis in a multilingual world remains a challenging problem, because developing language-specific sentiment lexicons is an extremely resourceintensive process. Such lexicons remain a scarce resource for most languages. In this paper, we address this lexicon gap by building high-quality sentiment lexicons for 136 major languages. We integrate a variety of linguistic resources to produce an immense knowledge graph. By appropriately propagating from seed words, we construct sentiment lexicons for each component language of our graph. Our lexicons have a polarity agreement of 95.7% with published lexicons, while achieving an overall coverage of 45.2%. We demonstrate the performance of our lexicons in an extrinsic analysis of 2,000 distinct historical figures' Wikipedia articles on 30 languages. Despite cultural difference and the intended neutrality of Wikipedia articles, our lexicons show an average sentiment correlation of 0.28 across all language pairs. 0 0
Streaming big data with self-adjusting computation Incremental mapreduce
Self-adjusting computation
DDFP 2013 - Proceedings of the 2013 ACM SIGPLAN Workshop on Data Driven Functional Programming, Co-located with POPL 2013 English 2013 Many big data computations involve processing data that changes incrementally or dynamically over time. Using existing techniques, such computations quickly become impractical. For example, computing the frequency of words in the first ten thousand paragraphs of a publicly available Wikipedia data set in a streaming fashion using MapReduce can take as much as a full day. In this paper, we propose an approach based on self-adjusting computation that can dramatically improve the efficiency of such computations. As an example, we can perform the aforementioned streaming computation in just a couple of minutes. Copyright 0 0
Efficient and scalable data evolution with column oriented databases Bitmap index
Column oriented database
Data evolution
Schema
ACM International Conference Proceeding Series English 2011 Database evolution is the process of updating the schema of a database or data warehouse (schema evolution) and evolving the data to the updated schema (data evolution). It is often desired or necessitated when changes occur to the data or the query workload, the initial schema was not carefully designed, or more knowledge of the database is known and a better schema is concluded. The Wikipedia database, for example, has had more than 170 versions in the past 5 years [8]. Unfortunately, although much research has been done on the schema evolution part, data evolution has long been a prohibitively expensive process, which essentially evolves the data by executing SQL queries and re-constructing indexes. This prevents databases from being flexibly and frequently changed based on the need and forces schema designers, who cannot afford mistakes, to be highly cautious. Techniques that enable efficient data evolution will undoubtedly make life much easier. In this paper, we study the efficiency of data evolution, and discuss the techniques for data evolution on column oriented databases, which store each attribute, rather than each tuple, contiguously. We show that column oriented databases have a better potential than traditional row oriented databases for supporting data evolution, and propose a novel data-level data evolution framework on column oriented databases. Our approach, as suggested by experimental evaluations on real and synthetic data, is much more efficient than the query-level data evolution on both row and column oriented databases, which involves unnecessary access of irrelevant data, materializing intermediate results and re-constructing indexes. 0 0
ITEM: Extract and integrate entities from tabular data to RDF knowledge base Entity Extraction
RDF Knowledge Base
Schema Mapping
Lecture Notes in Computer Science English 2011 Many RDF Knowledge Bases are created and enlarged by mining and extracting web data. Hence their data sources are limited to social tagging networks, such as Wikipedia, WordNet, IMDB, etc., and their precision is not guaranteed. In this paper, we propose a new system, ITEM, for extracting and integrating entities from tabular data to RDF knowledge base. ITEM can efficiently compute the schema mapping between a table and a KB, and inject novel entities into the KB. Therefore, ITEM can enlarge and improve RDF KB by employing tabular data, which is assumed of high quality. ITEM detects the schema mapping between table and RDF KB only by tuples, rather than the table's schema information. Experimental results show that our system has high precision and good performance. 0 0
Leveraging social networks to detect anomalous insider actions in collaborative environments Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, ISI 2011 English 2011 Collaborative information systems (CIS) enable users to coordinate efficiently over shared tasks. They are often deployed in complex dynamic systems that provide users with broad access privileges, but also leave the system vulnerable to various attacks. Techniques to detect threats originating from beyond the system are relatively mature, but methods to detect insider threats are still evolving. A promising class of insider threat detection models for CIS focus on the communities that manifest between users based on the usage of common subjects in the system. However, current methods detect only when a user's aggregate behavior is intruding, not when specific actions have deviated from expectation. In this paper, we introduce a method called specialized network anomaly detection (SNAD) to detect such events. SNAD assembles the community of users that access a particular subject and assesses if similarities of the community with and without a certain user are sufficiently different. We present a theoretical basis and perform an extensive empirical evaluation with the access logs of two distinct environments: those of a large electronic health record system (6,015 users, 130,457 patients and 1,327,500 accesses) and the editing logs of Wikipedia (2,388,955 revisors, 55,200 articles and 6,482,780 revisions). We compare SNAD with several competing methods and demonstrate it is significantly more effective: on average it achieves 20-30% greater area under an ROC curve. 0 0
Computing semantic relatedness between named entities using Wikipedia Semantic relatedness
Web mining
Proceedings - International Conference on Artificial Intelligence and Computational Intelligence, AICI 2010 English 2010 In this paper the authors suggest a novel approach that uses Wikipedia to measure the semantic relatedness between Chinese named entities, such as names of persons, books, softwares, etc. The relatedness is measured through articles in Wikipedia that are related to the named entities. The authors select a set of "definition words" which are hyperlinks from these articles, and then compute the relatedness between two named entities as the relatedness between two sets of definition words. The authors propose two ways to measure the relatedness between two definition words: by Wiki-articles related to the words or by categories of the words. Proposed approaches are compared with several other baseline models through experiments. The experimental results show that this method renders satisfactory results. 0 0
Automatic acquisition of attributes for ontology construction Attribute acquisition
Ontology construction
Wikipedia as resource source
Lecture Notes in Computer Science English 2009 An ontology can be seen as an organized structure of concepts according to their relations. A concept is associated with a set of attributes that themselves are also concepts in the ontology. Consequently, ontology construction is the acquisition of concepts and their associated attributes through relations. Manual ontology construction is time-consuming and difficult to maintain. Corpus-based ontology construction methods must be able to distinguish concepts themselves from concept instances. In this paper, a novel and simple method is proposed for automatically identifying concept attributes through the use of Wikipedia as the corpus. The built-in Template:Infobox in Wiki is used to acquire concept attributes and identify semantic types of the attributes. Two simple induction rules are applied to improve the performance. Experimental results show precisions of 92.5% for attribute acquisition and 80% for attribute type identification. This is a very promising result for automatic ontology construction. 0 0
Corpus Exploitation from Wikipedia for Ontology Construction English 2009 0 0
Mining Concepts from Wikipedia for Ontology Construction WI-IAT English 2009 An ontology is a structured knowledgebase of concepts organized by relations among them. But concepts are usually mixed with their instances in the corpora for knowledge extraction. Concepts and their corresponding instances share similar features and are difficult to distinguish. In this paper, a novel approach is proposed to comprehensively obtain concepts with the help of definition sentences and Category Labels in Wikipedia pages. N-gram statistics and other NLP knowledge are used to help extracting appropriate concepts. The proposed method identified nearly 50,000 concepts from about 700,000 Wiki pages. The precision reaching 78.5% makes it an effective approach to mine concepts from Wikipedia for ontology construction. 0 0
Mining concepts from Wikipedia for ontology construction Concept
Ontology construction
Wikipedia
Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2009 English 2009 An ontology is a structured knowledgebase of concepts organized by relations among them. But concepts are usually mixed with their instances in the corpora for knowledge extraction. Concepts and their corresponding instances share similar features and are difficult to distinguish. In this paper, a novel approach is proposed to comprehensively obtain concepts with the help of definition sentences and Category Labels in Wikipedia pages. N-gram statistics and other NLP knowledge are used to help extracting appropriate concepts. The proposed method identified nearly 50,000 concepts from about 700,000 Wiki pages. The precision reaching 78.5% makes it an effective approach to mine concepts from Wikipedia for ontology construction. 0 0