YanChun Zhang

From WikiPapers
Jump to: navigation, search

YanChun Zhang is an author.

Publications

Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Continuous temporal Top-K query over versioned documents Lecture Notes in Computer Science English 2014 The management of versioned documents has attracted researchers' attentions in recent years. Based on the observation that decision-makers are often interested in finding the set of objects that have continuous behavior over time, we study the problem of continuous temporal top-k query. With a given a query, continuous temporal top-k search finds the documents that frequently rank in the top-k during a time period and take the weights of different time intervals into account. Existing works regarding querying versioned documents have focused on adding the constraint of time, however lacked to consider the continuous ranking of objects and weights of time intervals. We propose a new interval window-based method to address this problem. Our method can get the continuous temporal top-k results while using interval windows to support time and weight constraints simultaneously. We use data from Wikipedia to evaluate our method. 0 0
Invasion biology and the success of social collaboration networks, with application to wikipedia Invasion biology
Social collaboration networks
Stochastic population theory
Wikipedia
Israel Journal of Ecology and Evolution English 2013 We adapt methods from the stochastic theory of invasions - for which a key question is whether a propagule will grow to an established population or fail - To show how monitoring early participation in a social collaboration network allows prediction of success. Social collaboration networks have become ubiquitous and can now be found in widely diverse situations. However, there are currently no methods to predict whether a social collaboration network will succeed or not, where success is defined as growing to a specified number of active participants before falling to zero active participants. We illustrate a suitable methodology with Wikipedia. In general, wikis are web-based software that allows collaborative efforts in which all viewers of a page can edit its contents online, thus encouraging cooperative efforts on text and hypertext. The English language Wikipedia is one of the most spectacular successes, but not all wikis succeed and there have been some major failures. Using these new methods, we derive detailed predictions for the English language Wikipedia and in summary for more than 250 other language Wikipedias. We thus show how ideas from population biology can inform aspects of technology in new and insightful ways. 0 0
Position-wise contextual advertising: Placing relevant ads at appropriate positions of a web page Contextual advertising
Similarity
Wikipedia knowledge
Neurocomputing English 2013 Web advertising, a form of online advertising, which uses the Internet as a medium to post product or service information and attract customers, has become one of the most important marketing channels. As one prevalent type of web advertising, contextual advertising refers to the placement of the most relevant ads at appropriate positions of a web page, so as to provide a better user experience and increase the user's ad-click rate. However, most existing contextual advertising techniques only take into account how to select as relevant ads for a given page as possible, without considering the positional effect of the ad placement on the page, resulting in an unsatisfactory performance in ad local context relevance. In this paper, we address the novel problem of position-wise contextual advertising, i.e., how to select and place relevant ads properly for a target web page. In our proposed approach, the relevant ads are selected based on not only global context relevance but also local context relevance, so that the embedded ads yield contextual relevance to both the whole target page and the insertion positions where the ads are placed. In addition, to improve the accuracy of global and local context relevance measure, the rich wikipedia knowledge is used to enhance the semantic feature representation of pages and ad candidates. Last, we evaluate our approach using a set of ads and pages downloaded from the Internet, and demonstrate the effectiveness of our approach. © 2013 Elsevier B.V. 0 0
A dual hashtables algorithm for durable top-k search Document Archives
Durable top-k
Hashtable
Multi-version
Proceedings - 9th Web Information Systems and Applications Conference, WISA 2012 English 2012 We propose a dual hash tables algorithm which can realize the durable top-k search. Two hash tables are constructed to keep the core information, such as score and time in the inverted lists. We use the key-value relationships between the two hash tables to calculate the scores which measure the correlations between a keyword and documents, and search the versioned objects that are consistent in the top-k results throughout a given query interval. Finally, we use data from Wikipedia to demonstrate the efficiency and performance of our algorithm. 0 0
An Improved Contextual Advertising Matching Approach based on Wikipedia Knowledge Comput. J. English 2012 0 0
Text classification using Wikipedia knowledge Semi-supervised learning
Text classification
Wikipedia
ICIC Express Letters, Part B: Applications English 2012 In the real world, there are large amounts of unlabeled text documents, but traditional approaches usually require a lot of labeled documents, which are expensive to obtain. In this paper we propose an approach using the Wikipedia for text classification. We firstly extract the related wiki documents with the given keywords, then label the documents with the representative features selected from the related wiki documents, and finally build an SVM text classifier. Experimental results on 20-Newsgroup dataset show that the proposed method performs well and stably. 0 0
Twitter user modeling and tweets recommendation based on wikipedia concept graph AAAI Workshop - Technical Report English 2012 As a microblogging service, Twitter is playing a more and more important role in our life. Users follow various accounts, such as friends or celebrities, to get the most recent information. However, as one follows more and more people, he/she may be overwhelmed by the huge amount of status updates. Twitter messages are only displayed by time recency, which means if one cannot read all messages, he/she may miss some important or interesting tweets. In this paper, we propose to re-rank tweets in user's timeline, by constructing a user profile based on user's previous tweets and measuring the relevance between a tweet and user interest. The user interest profile is represented as concepts from Wikipedia, which is quite a large and inter-linked online knowledge base. We make use of Explicit Semantic Analysis algorithm to extract related concepts from tweets, and then expand user's profile by random walk on Wikipedia concept graph, utilizing the inter-links between Wikipedia articles. Our experiments show that our model is effective and efficient to recommend tweets to users. Copyright © 2012, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 0 0
Aging-kb: A knowledge base for the study of the aging process Aging database
Knowledge base
Wiki
Mechanisms of Ageing and Development English 2011 As the science of the aging process moves forward, a recurring challenge is the integration of multiple types of data and information with classical aging theory while disseminating that information to the scientific community. Here we present AGING-kb, a public knowledge base with the goal of conceptualizing and presenting fundamental aspects of the study of the aging process. Aging-kb has two interconnected parts, the Aging-kb tree and the Aging Wiki. The Aging-kb tree is a simple intuitive dynamic tree hierarchy of terms describing the field of aging from the general to the specific. This enables the user to see relationships between areas of aging research in a logical comparative fashion. The second part is a specialized Aging Wiki which allows expert definition, description, supporting information, and documentation of each aging keyword term found in the Aging-kb tree. The Aging Wiki allows community participation in describing and defining concepts and terms in the Wiki format. This aging knowledge base provides a simple intuitive interface to the complexities of aging. 0 0
Hybrid and interactive domain-specific translation for multilingual access to digital libraries Lecture Notes in Computer Science English 2011 Accurate high-coverage translation is a vital component of reliable cross language information retrieval (CLIR) systems. This is particularly true for retrieval from archives such as Digital Libraries which are often specific to certain domains. While general machine translation (MT) has been shown to be effective for CLIR tasks in laboratory information retrieval evaluation tasks, it is generally not well suited to specialized situations where domain-specific translations are required. We demonstrate that effective query translation in the domain of cultural heritage (CH) can be achieved using a hybrid translation method which augments a standard MT system with domain-specific phrase dictionaries automatically mined from Wikipedia . We further describe the use of these components in a domain-specific interactive query translation service. The interactive system selects the hybrid translation by default, with other possible translations being offered to the user interactively to enable them to select alternative or additional translation(s). The objective of this interactive service is to provide user control of translation while maximising translation accuracy and minimizing the translation effort of the user. Experiments using our hybrid translation system with sample query logs from users of CH websites demonstrate a large improvement in the accuracy of domain-specific phrase detection and translation. 0 0
Ontology enhancement and concept granularity learning: Keeping yourself current and adaptive Ontology
Tailor-made concept representation learning
Workinet
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining English 2011 As a well-known semantic repository, WordNet is widely used in many applications. However, due to costly edit and maintenance, WordNet's capability of keeping up with the emergence of new concepts is poor compared with on-line encyclopedias such as Wikipedia. To keep WordNet current with folk wisdom, we propose a method to enhance WordNet automatically by merging Wikipedia entities into WordNet, and construct an enriched ontology, named as WorkiNet. WorkiNet keeps the desirable structure of WordNet. At the same time, it captures abundant information from Wikipedia. We also propose a learning approach which is able to generate a tailor-made semantic concept collection for a given document collection. The learning process takes the characteristics of the given document collection into consideration and the semantic concepts in the tailor-made collection can be used as new features for document representation. The experimental results show that the adaptively generated feature space can outperform a static one significantly in text mining tasks, and WorkiNet dominates WordNet most of the time due to its high coverage. Copyright 2011 ACM. 1 0
Parser evaluation over local and non-local deep dependencies in a large corpus EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference English 2011 In order to obtain a fine-grained evaluation of parser accuracy over naturally occurring text, we study 100 examples each of ten reasonably frequent linguistic phenomena, randomly selected from a parsed version of the English Wikipedia. We construct a corresponding set of gold-standard target dependencies for these 1000 sentences, operationalize mappings to these targets from seven state-of-the-art parsers, and evaluate the parsers against this data to measure their level of success in identifying these dependencies. 0 0
Wiki-induced cognitive elaboration in project teams: An empirical study Critical norm
Knowledge integration
Process accountability
Task involvement
Task reflexivity
Time pressure
Wiki-induced cognitive elaboration
International Conference on Information Systems 2011, ICIS 2011 English 2011 Researchers have exerted increasing efforts to understand how wikis can be used to improve team performance. Previous studies have mainly focused on the effect of the quantity of wiki use on performance in wiki-based communities; however, only inconclusive results have been obtained. Our study focuses on the quality of wiki use in a team context. We develop a construct of wiki-induced cognitive elaboration, and explore its nomological network in the team context. Integrating the literatures on wiki and distributed cognition, we propose that wiki-induced cognitive elaboration influences team performance through knowledge integration among team members. We also identify its team-based antecedents, including task involvement, critical norm, task reflexivity, time pressure and process accountability, by drawing on the motivated information processing literature. The research model is empirically tested using multiple-source survey data collected from 46 wiki-based student project teams. The theoretical and practical implications of our findings are also discussed. 0 0
Chart pruning for fast lexicalised-grammar parsing Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference English 2010 Given the increasing need to process massive amounts of textual data, efficiency of NLP tools is becoming a pressing concern. Parsers based on lexicalised grammar formalisms, such as TAG and CCG, can be made more efficient using supertagging, which for CCG is so effective that every derivation consistent with the supertagger output can be stored in a packed chart. However, wide-coverage CCG parsers still produce a very large number of derivations for typical newspaper or Wikipedia sentences. In this paper we investigate two forms of chart pruning, and develop a novel method for pruning complete cells in a parse chart. The result is a widecoverage CCG parser that can process almost 100 sentences per second, with little or no loss in accuracy over the baseline with no pruning. 0 0
Do wikipedians follow domain experts?: A domain-specific study on wikipedia knowledge building Contributing behavior
Knowledge building
Wikipedia
Proceedings of the ACM International Conference on Digital Libraries English 2010 Wikipedia is one of the most successful online knowledge bases, attracting millions of visits daily. Not surprisingly, its huge success has in turn led to immense research interest for a better understanding of the collaborative knowledge building process. In this paper, we performed a (terrorism) domain-specific case study, comparing and contrasting the knowledge evolution in Wikipedia with a knowledge base created by domain experts. Specifically, we used the Terrorism Knowledge Base (TKB) developed by experts at MIPT. We identified 409 Wikipedia articles matching TKB records, and went ahead to study them from three aspects: creation, revision, and link evolution. We found that the knowledge building in Wikipedia had largely been independent, and did not follow TKB - despite the open and online availability of the latter, as well as awareness of at least some of the Wikipedia contributors about the TKB source. In an attempt to identify possible reasons, we conducted a detailed analysis of contribution behavior demonstrated by Wikipedians. It was found that most Wikipedians contribute to a relatively small set of articles each. Their contribution was biased towards one or very few article(s). At the same time, each article's contributions are often championed by very few active contributors including the article's creator. We finally arrive at a conjecture that the contributions in Wikipedia are more to cover knowledge at the article level rather than at the domain level. 0 1
Tag transformer Online user study
Structural web video recommendation
Tag cleaning
Tag transformer
Wikipedia category tree
MM'10 - Proceedings of the ACM Multimedia 2010 International Conference English 2010 Human annotations (titles and tags) of web videos facilitate most web video applications. However, the raw tags are noisy, sparse and structureless, which limit the effectiveness of tags. In this paper, we propose a tag transformer schema to solve these problems. We first eliminate those imprecise and meaningless tags with Wikipedia, and then transform the remaining tags to the Wikipedia category set to gather a precise, complete and structural description of the tags. Our experimental results on web video categorization demonstrate the superiority of the transformed space. We also apply tag transformer into the first study of using Wikipedia category system to structurally recommend the related videos. The online user study of the demo system suggests that our method could bring fantastic experience to the web users. 0 0
A 'uses and gratifications' approach to understanding the role of wiki technology in enhancing teaching and learning outcomes Constructivist learning
Motivation
Technology-mediated learning (TML)
Uses and gratifications approach (U&G)
Wiki technology
17th European Conference on Information Systems, ECIS 2009 English 2009 The use of the Wikis in both post-graduate and undergraduate teaching is rapidly increasing in popularity. Much of the research into the use of this technology has focused on the practical aspects of how the technology can be used and is yet to address why it is used, or in what way it enhances teaching and learning outcomes. A comparison of the key characteristics of the constructivist learning approach and Wikis suggests that Wikis could provide considerable support of this approach, however research into the motivations for using the technology is required so that good teaching practices may be applied to the use of Wikis when utilized in the higher education context. This study articulates a research design grounded in the Technology Mediated Learning (TML) paradigm that could be used to explore teachers and students' motivations for using Wiki technology to enhance teaching and learning outcomes. Using the 'Uses and Gratification' approach, a popular technique used for understanding user motivation in technology adoption, a two-stage research design is set out. Finally, the paper concludes with a discussion of the implications for both information systems researchers and higher education. 0 0
Building a text classifier by a keyword and Wikipedia knowledge Keyword
Text classification
Unlabeled document
Wikipedia
Lecture Notes in Computer Science English 2009 Traditional approach for building text classifiers usually require a lot of labeled documents, which are expensive to obtain. In this paper, we propose a new text classification approach based on a keyword and Wikipedia knowledge, so as to avoid labeling documents manually. Firstly, we retrieve a set of related documents about the keyword from Wikipedia. And then, with the help of related Wikipedia pages, more positive documents are extracted from the unlabeled documents. Finally, we train a text classifier with these positive documents and unlabeled documents. The experiment result on 20Newsgroup dataset show that the proposed approach performs very competitively compared with NB-SVM, a PU learner, and NB, a supervised learner. 0 0
MagicCube: Choosing the best snippet for each aspect of an entity Entity
MagicCube
Snippet
Wiki
International Conference on Information and Knowledge Management, Proceedings English 2009 Wikis are currently used in business to provide knowledge management systems, especially for individual organizations. However, building wikis manually is a laborious and time-consuming work. To assist founding wikis, we propose a methodology in this paper to automatically select the best snippets for entities as their initial explanations. Our method consists of two steps. First, we focus on extracting snippets from a given set of web pages for each entity. Starting from a seed sentence, a snippet grows up by adding the most relevant neighboring sentences into itself. The sentences are chosen by the Snippet Growth Model, which employs a distance function and an influence function to make decisions. Secondly, we pick out the best snippet for each aspect of an entity. The combination of all the selected snippets serves as the primary description of the entity. We present three ever-increasing methods to handle selection process. Experimental results based on a real data set show that our proposed method works effectively in producing primary descriptions for entities such as employee names. Copyright 2009 ACM. 0 0
Towards design principles for effective context-and perspective-based web mining Proceedings of the 4th International Conference on Design Science Research in Information Systems and Technology, DESRIST '09 English 2009 A practical and scalable web mining solution is needed that can assist the user in processing existing web-based resources to discover specific, relevant information content. This is especially important for researcher communities where data deployed on the World Wide Web are characterized by autonomous, dynamically evolving, and conceptually diverse information sources. The paper describes a systematic design research study that is based on prototyping/evaluation and abstraction using existing and new techniques incorporated as plug and play components into a research workbench. The study investigates an approach, DISCOVERY, for using (1) context/perspective information and (2) social networks such as ODP or Wikipedia for designing practical and scalable human-web systems for finding web pages that are relevant and meet the needs and requirements of a user or a group of users. The paper also describes the current implementation of DISCOVERY and its initial use in finding web pages in a targeted web domain. The resulting system arguably meets the common needs and requirements of a group of people based on the information provided by the group in the form of a set of context web pages. The system is evaluated for a scenario in which assistance of the system is sought for a group of faculty members in finding NSF research grant opportunities that they should collaboratively respond to, utilizing the context provided by their recent publications. Copyright 2009 ACM. 0 0
Dublin City University at CLEF 2007: Cross-language speech retrieval experiments Lecture Notes in Computer Science English 2008 The Dublin City University participation in the CLEF 2007 CL-SR English task concentrated primarily on issues of topic translation. Our retrieval system used the BM25F model and pseudo relevance feedback. Topics were translated into English using the Yahoo! BabelFish free online service combined with domain-specific translation lexicons gathered automatically from Wikipedia. We explored alternative topic translation methods using these resources. Our results indicate that extending machine translation tools using automatically generated domain-specific translation lexicons can provide improved CLIR effectiveness for this task. 0 0