Jing-Woei Li

From WikiPapers
Jump to: navigation, search

Jing-Woei Li is an author.


Only those publications related to wikis are shown here.
Title Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Boosting cross-lingual knowledge linking via concept annotation IJCAI International Joint Conference on Artificial Intelligence English 2013 Automatically discovering cross-lingual links (CLs) between wikis can largely enrich the cross-lingual knowledge and facilitate knowledge sharing across different languages. In most existing approaches for cross-lingual knowledge linking, the seed CLs and the inner link structures are two important factors for finding new CLs. When there are insufficient seed CLs and inner links, discovering new CLs becomes a challenging problem. In this paper, we propose an approach that boosts cross-lingual knowledge linking by concept annotation. Given a small number of seed CLs and inner links, our approach first enriches the inner links in wikis by using concept annotation method, and then predicts new CLs with a regression-based learning model. These two steps mutually reinforce each other, and are executed iteratively to find as many CLs as possible. Experimental results on the English and Chinese Wikipedia data show that the concept annotation can effectively improve the quantity and quality of predicted CLs. With 50,000 seed CLs and 30% of the original inner links in Wikipedia, our approach discovered 171,393 more CLs in four runs when using concept annotation. 0 0
Discovering missing semantic relations between entities in Wikipedia Infobox
Linked data
Lecture Notes in Computer Science English 2013 Wikipedia's infoboxes contain rich structured information of various entities, which have been explored by the DBpedia project to generate large scale Linked Data sets. Among all the infobox attributes, those attributes having hyperlinks in its values identify semantic relations between entities, which are important for creating RDF links between DBpedia's instances. However, quite a few hyperlinks have not been anotated by editors in infoboxes, which causes lots of relations between entities being missing in Wikipedia. In this paper, we propose an approach for automatically discovering the missing entity links in Wikipedia's infoboxes, so that the missing semantic relations between entities can be established. Our approach first identifies entity mentions in the given infoboxes, and then computes several features to estimate the possibilities that a given attribute value might link to a candidate entity. A learning model is used to obtain the weights of different features, and predict the destination entity for each attribute value. We evaluated our approach on the English Wikipedia data, the experimental results show that our approach can effectively find the missing relations between entities, and it significantly outperforms the baseline methods in terms of both precision and recall. 0 0
Evaluating article quality and editor reputation in Wikipedia Editor reputation
Factor graph
Quality evaluation
Communications in Computer and Information Science English 2013 We study a novel problem of quality and reputation evaluation for Wikipedia articles. We propose a difficult and interesting question: How to generate reasonable article quality score and editor reputation in a framework at the same time? In this paper, We propose a dual wing factor graph(DWFG) model, which utilizes the mutual reinforcement between articles and editors to generate article quality and editor reputation. To learn the proposed factor graph model, we further design an efficient algorithm. We conduct experiments to validate the effectiveness of the proposed model. By leveraging the belief propagation between articles and editors, our approach obtains significant improvement over several alternative methods(SVM, LR, PR, CRF). 0 0
Ontology-enriched multi-document summarization in disaster management using submodular function Multi-document summarization
Information Sciences English 2013 In disaster management, a myriad of news and reports relevant to the disaster may be recorded in the form of text document. A challenging problem is how to provide concise and informative reports from a large collection of documents, to help domain experts analyze the trend of the disaster. In this paper, we explore the feasibility of using a domain-specific ontology to facilitate the summarization task, and propose TELESUM, an ontology-enriched multi-document summarization approach, where the submodularity hidden in among ontological concepts is investigated. Empirical experiments on the collection of press releases by Miami-Dade County Department of Emergency Management during Hurricane Wilma in 2005 demonstrate the efficacy and effectiveness of TELESUM in disaster management. Further, our proposed framework can be extended to summarizing general documents by employing public ontologies, e.g.; Wikipedia. Extensive evaluation on the generalized framework is conducted on DUC04-05 datasets, and shows that our method is competitive with other approaches. © 2012 Elsevier Inc. All rights reserved. 0 0
Building a large scale knowledge base from Chinese Wiki Encyclopedia Knowledge base
Linked data
Semantic web
Lecture Notes in Computer Science English 2012 DBpedia has been proved to be a successful structured knowledge base, and large scale Semantic Web data has been built by using DBpedia as the central interlinking-hubs of the Web of Data in English. But in Chinese, due to the heavily imbalance in size (no more than one tenth) between English and Chinese in Wikipedia, there are few Chinese linked data are published and linked to DBpedia, which hinders the structured knowledge sharing both within Chinese resources and cross-lingual resources. This paper aims at building large scale Chinese structured knowledge base from Hudong, which is one of the largest Chinese Wiki Encyclopedia websites. In this paper, an upper-level ontology schema in Chinese is first learned based on the category system and Infobox information in Hudong. Totally, there are 19542 concepts are inferred, which are organized in hierarchy with maximally 20 levels. 2381 properties with domain and range information are learned according to the attributes in the Hudong Infoboxes. Then, 802593 instances are extracted and described using the concepts and properties in the learned ontology. These extracted instances cover a wide range of things, including persons, organizations, places and so on. Among all the instances, 62679 of them are linked to identical instances in DBpedia. Moreover, the paper provides RDF dump or SPARQL to access the established Chinese knowledge base. The general upper-level ontology and wide coverage makes the knowledge base a valuable Chinese semantic resource. It not only can be used in Chinese linked data building, the fundamental work for building multi lingual knowledge base across heterogeneous resources of different languages, but also can largely facilitate many useful applications of large-scale knowledge base such as knowledge question-answering and semantic search. 0 0
Cross-lingual knowledge linking across wiki knowledge bases Cross-language
Knowledge linking
Knowledge sharing
Wiki knowledge base
WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web English 2012 Wikipedia becomes one of the largest knowledge bases on the Web. It has attracted 513 million page views per day in January 2012. However, one critical issue for Wikipedia is that articles in different language are very unbalanced. For example, the number of articles on Wikipedia in English has reached 3.8 million, while the number of Chinese articles is still less than half million and there are only 217 thousand cross-lingual links between articles of the two languages. On the other hand, there are more than 3.9 million Chinese Wiki articles on Baidu Baike and Hudong.com, two popular encyclopedias in Chinese. One important question is how to link the knowledge entries distributed in different knowledge bases. This will immensely enrich the information in the online knowledge bases and benefit many applications. In this paper, we study the problem of cross-lingual knowledge linking and present a linkage factor graph model. Features are defined according to some interesting observations. Experiments on the Wikipedia data set show that our approach can achieve a high precision of 85.8% with a recall of 88.1%. The approach found 202,141 new cross-lingual links between English Wikipedia and Baidu Baike. 0 0
Exploration and visualization of administrator network in wikipedia Human factors
Social network analysis
Lecture Notes in Computer Science English 2012 Wikipedia has become one of the most widely used knowledge systems on the Web. It contains the resources and information with different qualities contributed by different set of authors. A special group of authors named administrators plays an important role for content quality in Wikipedia. Understanding the behaviors of administrators in Wikipedia can facilitate the management of Wikipedia system, and empower some applications such as article recommendation and expertise administrator finding for given articles. This paper addresses the work of the exploration and visualization of the administrator network in Wikipedia. Administrator network is firstly constructed by using co-editing relationship and six characteristics for administrators are proposed to describe the behaviors of administrators in Wikipedia from different perspectives. Quantified calculation of these characteristics is then put forwarded by using social network analysis techniques. Topic model is used to relate content of Wikipedia to the interest diversity of administrators. Based on the media wiki history records from the January 2010 to January 2011, we develop an administrator exploration prototype system which can rank the selected characteristics for administrators and can be used as a decision support system. Furthermore, some meaningful observations are found to show that the administrator network is a healthy small world community and a strong centralization of the network around some hubs/stars is obtained to mean a considerable nucleus of very active administrators that seems to be omnipresent. These top ranked administrators ranking is found to be consistent with the number of barn stars awarded to them. 0 0
Exploring simultaneous keyword and key sentence extraction: Improve graph-based ranking using Wikipedia Graph
Markov chain
ACM International Conference Proceeding Series English 2012 Summarization and Keyword Selection are two important tasks in NLP community. Although both aim to summarize the source articles, they are usually treated separately by using sentences or words. In this paper, we propose a two-level graph based ranking algorithm to generate summarization and extract keywords at the same time. Previous works have reached a consensus that important sentence is composed by important keywords. In this paper, we further study the mutual impact between them through context analysis. We use Wikipedia to build a two-level concept-based graph, instead of traditional term-based graph, to express their homogenous relationship and heterogeneous relationship. We run PageRank and HITS rank on the graph to adjust both homogenous and heterogeneous relationships. A more reasonable relatedness value will be got for key sentence selection and keyword selection. We evaluate our algorithm on TAC 2011 data set. Traditional term-based approach achieves a score of 0.255 in ROUGE-1 and a score of 0.037 and ROUGE-2 and our approach can improve them to 0.323 and 0.048 separately. 0 0
The SEQanswers wiki: a wiki database of tools for high-throughput sequencing analysis English 2012 Recent advances in sequencing technology have created unprecedented opportunities for biological research. However, the increasing throughput of these technologies has created many challenges for data management and analysis. As the demand for sophisticated analyses increases, the development time of software and algorithms is outpacing the speed of traditional publication. As technologies continue to be developed, methods change rapidly, making publications less relevant for users. The SEQanswers wiki (SEQwiki) is a wiki database that is actively edited and updated by the members of the SEQanswers community (http://SEQanswers.com/). The wiki provides an extensive catalogue of tools, technologies and tutorials for high-throughput sequencing (HTS), including information about HTS service providers. It has been implemented in MediaWiki with the Semantic MediaWiki and Semantic Forms extensions to collect structured data, providing powerful navigation and reporting features. Within 2 years, the community has created pages for over 500 tools, with approximately 400 literature references and 600 web links. This collaborative effort has made SEQwiki the most comprehensive database of HTS tools anywhere on the web. The wiki includes task-focused mini-reviews of commonly used tools, and a growing collection of more than 100 HTS service providers. SEQwiki is available at: http://wiki.SEQanswers.com/. 0 0
A slang open book: An exploration of Wiki for ESL learners CALL
Social Constructivism
Ubiquitous Learning English 2011 This paper describes the Online Slang Open Book project (www.wikislang.org) we developed based on MediaWiki (www.MediaWiki.org). The project was designed to help ESL learners to study English slang in an authentic context and help them appreciate the cultural differences between English and their mother languages. This study's results suggest that a Wiki is an effective tool to improve the ESL learning process and learners tend to collaborative well in Wikis. The implication of the project and the future work are discussed at the end. 0 0
Finding hierarchy in directed online social networks Hierarchy
Social network
Proceedings of the 20th International Conference on World Wide Web, WWW 2011 English 2011 Social hierarchy and stratification among humans is a well studied concept in sociology. The popularity of online social networks presents an opportunity to study social hierarchy for different types of networks and at different scales. We adopt the premise that people form connections in a social network based on their perceived social hierarchy; as a result, the edge directions in directed social networks can be leveraged to infer hierarchy. In this paper, we define a measure of hierarchy in a directed online social network, and present an efficient algorithm to compute this measure. We validate our measure using ground truth including Wikipedia notability score. We use this measure to study hierarchy in several directed online social networks including Twitter, Delicious, YouTube, Flickr, LiveJournal, and curated lists of several categories of people based on different occupations, and different organizations. Our experiments on different online social networks show how hierarchy emerges as we increase the size of the network. This is in contrast to random graphs, where the hierarchy decreases as the network size increases. Further, we show that the degree of stratification in a network increases very slowly as we increase the size of the graph. Copyright © 2011 by the Association for Computing Machinery, Inc. (ACM). 0 0
Tag transformer Online user study
Structural web video recommendation
Tag cleaning
Tag transformer
Wikipedia category tree
MM'10 - Proceedings of the ACM Multimedia 2010 International Conference English 2010 Human annotations (titles and tags) of web videos facilitate most web video applications. However, the raw tags are noisy, sparse and structureless, which limit the effectiveness of tags. In this paper, we propose a tag transformer schema to solve these problems. We first eliminate those imprecise and meaningless tags with Wikipedia, and then transform the remaining tags to the Wikipedia category set to gather a precise, complete and structural description of the tags. Our experimental results on web video categorization demonstrate the superiority of the transformed space. We also apply tag transformer into the first study of using Wikipedia category system to structurally recommend the related videos. The online user study of the demo system suggests that our method could bring fantastic experience to the web users. 0 0