Semantic search

Jump to: navigation, search

[Edit query]| Show embed code


Previous     Results 251 – 500    Next        (20 | 50 | 100 | 250 | 500)
Title Title# Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Combining lexical and semantic features for short text classification Combining lexical and semantic features for short text classification Yang L.
Chenliang Li
Ding Q.
Li L.
Procedia Computer Science English 2013 In this paper, we propose a novel approach to classify short texts by combining both their lexical and semantic features. We present an improved measurement method for lexical feature selection and furthermore obtain the semantic features with the background knowledge repository which covers target category domains. The combination of lexical and semantic features is achieved by mapping words to topics with different weights. In this way, the dimensionality of feature space is reduced to the number of topics. We here use Wikipedia as background knowledge and employ Support Vector Machine (SVM) as classifier. The experiment results show that our approach has better effectiveness compared with existing methods for classifying short texts. 0 0
Community detection from signed networks Community detection from signed networks Sugihara T.
Xiaojiang Liu
Murata T.
Transactions of the Japanese Society for Artificial Intelligence English 2013 Many real-world complex systems can be modeled as networks, and most of them exhibit community structures. Community detection from networks is one of the important topics in link mining. In order to evaluate the goodness of detected communities, Newman modularity is widely used. In real world, however, many complex systems can be modeled as signed networks composed of positive and negative edges. Community detection from signed networks is not an easy task, because the conventional detection methods for normal networks cannot be applied directly. In this paper, we extend Newman modularity for signed networks. We also propose a method for optimizing our modularity, which is an efficient hierarchical agglomeration algorithm for detecting communities from signed networks. Our method enables us to detect communities from large scale real-world signed networks which represent relationship between users on websites such as Wikipedia, Slashdot and Epinions. 0 0
Complementary information for Wikipedia by comparing multilingual articles Complementary information for Wikipedia by comparing multilingual articles Fujiwara Y.
Yu Suzuki
Konishi Y.
Akiyo Nadamoto
Lecture Notes in Computer Science English 2013 Information of many articles is lacking in Wikipedia because users can create and edit the information freely. We specifically examined the multilinguality of Wikipedia and proposed a method to complement information of articles which lack information based on comparing different language articles that have similar contents. However, much non-complementary information is unrelated to a user's browsing article in the results. Herein, we propose improvement of the comparison area based on the classified complementary target. 0 0
Computing semantic relatedness from human navigational paths on wikipedia Computing semantic relatedness from human navigational paths on wikipedia Singer P.
Niebler T.
Strohmaier M.
Hotho A.
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 This paper presents a novel approach for computing semantic relatedness between concepts on Wikipedia by using human navigational paths for this task. Our results suggest that human navigational paths provide a viable source for calculating semantic relatedness between concepts on Wikipedia. We also show that we can improve accuracy by intelligent selection of path corpora based on path characteristics indicating that not all paths are equally useful. Our work makes an argument for expanding the existing arsenal of data sources for calculating semantic relatedness and to consider the utility of human navigational paths for this task. 0 0
Computing semantic relatedness using Wikipedia features Computing semantic relatedness using Wikipedia features Hadj Taieb M.A.
Ben Aouicha M.
Ben Hamadou A.
Knowledge-Based Systems English 2013 Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguistics, cognitive science and artificial intelligence. In this paper, we propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances. Therefore, we utilized the Wikipedia features (articles, categories, Wikipedia category graph and redirection) in a system combining this Wikipedia semantic information in its different components. The approach is preceded by a pre-processing step to provide for each category pertaining to the Wikipedia category graph a semantic description vector including the weights of stems extracted from articles assigned to the target category. Next, for each candidate word, we collect its categories set using an algorithm for categories extraction from the Wikipedia category graph. Then, we compute the semantic relatedness degree using existing vector similarity metrics (Dice, Overlap and Cosine) and a new proposed metric that performed well as cosine formula. The basic system is followed by a set of modules in order to exploit Wikipedia features to quantify better as possible the semantic relatedness between words. We evaluate our measure based on two tasks: comparison with human judgments using five datasets and a specific application "solving choice problem". Our result system shows a good performance and outperforms sometimes ESA (Explicit Semantic Analysis) and TSA (Temporal Semantic Analysis) approaches. © 2013 Elsevier B.V. All rights reserved. 0 0
Constructing a focused taxonomy from a document collection Constructing a focused taxonomy from a document collection Olena Medelyan
Manion S.
Broekstra J.
Divoli A.
Huang A.-L.
Witten I.H.
Lecture Notes in Computer Science English 2013 We describe a new method for constructing custom taxonomies from document collections. It involves identifying relevant concepts and entities in text; linking them to knowledge sources like Wikipedia, DBpedia, Freebase, and any supplied taxonomies from related domains; disambiguating conflicting concept mappings; and selecting semantic relations that best group them hierarchically. An RDF model supports interoperability of these steps, and also provides a flexible way of including existing NLP tools and further knowledge sources. From 2000 news articles we construct a custom taxonomy with 10,000 concepts and 12,700 relations, similar in structure to manually created counterparts. Evaluation by 15 human judges shows the precision to be 89% and 90% for concepts and relations respectively; recall was 75% with respect to a manually generated taxonomy for the same domain. 0 0
Construction of a Japanese gazetteers for Japanese local toponym disambiguation Construction of a Japanese gazetteers for Japanese local toponym disambiguation Yoshioka M.
Fujiwara T.
Proceedings of the 7th Workshop on Geographic Information Retrieval, GIR 2013 English 2013 When processing toponym information in natural language text, it is crucial to have a good gazetteers. There are several well-organized gazetteers for English text, but they do not cover Japanese local toponyms. In this paper, we introduce a Japanese gazetteers based on Open Data (e.g., the Toponym database distributed by Japanese ministries, Wikipedia, and GeoNames) and propose a toponym disambiguation framework that uses the constructed gazetteers. We also evaluate our approach based on a blog corpus that contains place names with high ambiguity. 0 0
Contributor profiles, their dynamics, and their importance in five Q&A sites Contributor profiles, their dynamics, and their importance in five Q&A sites Furtado A.
Andrade N.
Oliveira N.
Brasileiro F.
English 2013 Q&A sites currently enable large numbers of contributors to collectively build valuable knowledge bases. Naturally, these sites are the product of contributors acting in different ways - creating questions, answers or comments and voting in these -, contributing in diverse amounts, and creating content of varying quality. This paper advances present knowledge about Q&A sites using a multifaceted view of contributors that accounts for diversity of behavior, motivation and expertise to characterize their profiles in five sites. This characterization resulted in the definition of ten behavioral profiles that group users according to the quality and quantity of their contributions. Using these profiles, we find that the five sites have remarkably similar distributions of contributor profiles. We also conduct a longitudinal study of contributor profiles in one of the sites, identifying common profile transitions, and finding that although users change profiles with some frequency, the site composition is mostly stable over time. Copyright 2013 ACM. 0 0
Could someone please translate this? - Activity analysis of wikipedia article translation by non-experts Could someone please translate this? - Activity analysis of wikipedia article translation by non-experts Ari Hautasaari English 2013 Wikipedia translation activities aim to improve the quality of the multilingual Wikipedia through article translation. We performed an activity analysis of the translation work done by individual English to Chinese non-expert translators, who translated linguistically complex Wikipedia articles in a laboratory setting. From the analysis, which was based on Activity Theory, and which examined both information search and translation activities, we derived three translation strategies that were used to inform the design of a support system for human translation activities in Wikipedia. Copyright 2013 ACM. 0 0
Crawling deep web entity pages Crawling deep web entity pages He Y.
Xin D.
Ganti V.
Rajaraman S.
Shah N.
WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining English 2013 Deep-web crawl is concerned with the problem of surfacing hidden content behind search interfaces on the Web. While many deep-web sites maintain document-oriented textual content (e.g., Wikipedia, PubMed, Twitter, etc.), which has traditionally been the focus of the deep-web literature, we observe that a significant portion of deep-web sites, including almost all online shopping sites, curate structured entities as opposed to text documents. Although crawling such entity-oriented content is clearly useful for a variety of purposes, existing crawling techniques optimized for document oriented content are not best suited for entity-oriented sites. In this work, we describe a prototype system we have built that specializes in crawling entity-oriented deep-web sites. We propose techniques tailored to tackle important subproblems including query generation, empty page filtering and URL deduplication in the specific context of entity oriented deep-web sites. These techniques are experimentally evaluated and shown to be effective. 0 0
Cross language prediction of vandalism on wikipedia using article views and revisions Cross language prediction of vandalism on wikipedia using article views and revisions Tran K.-N.
Christen P.
Lecture Notes in Computer Science English 2013 Vandalism is a major issue on Wikipedia, accounting for about 2% (350,000+) of edits in the first 5 months of 2012. The majority of vandalism are caused by humans, who can leave traces of their malicious behaviour through access and edit logs. We propose detecting vandalism using a range of classifiers in a monolingual setting, and evaluated their performance when using them across languages on two data sets: the relatively unexplored hourly count of views of each Wikipedia article, and the commonly used edit history of articles. Within the same language (English and German), these classifiers achieve up to 87% precision, 87% recall, and F1-score of 87%. Applying these classifiers across languages achieve similarly high results of up to 83% precision, recall, and F1-score. These results show characteristic vandal traits can be learned from view and edit patterns, and models built in one language can be applied to other languages. 0 0
Cross lingual entity linking with bilingual topic model Cross lingual entity linking with bilingual topic model Zhang T.
Kang Liu
Jun Zhao
IJCAI International Joint Conference on Artificial Intelligence English 2013 Cross lingual entity linking means linking an entity mention in a background source document in one language with the corresponding real world entity in a knowledge base written in the other language. The key problem is to measure the similarity score between the context of the entity mention and the document of the cand idate entity. This paper presents a general framework for doing cross lingual entity linking by leveraging a large scale and bilingual knowledge base, Wikipedia. We introduce a bilingual topic model that mining bilingual topic from this knowledge base with the assumption that the same Wikipedia concept documents of two different languages share the same semantic topic distribution. The extracted topics have two types of representation, with each type corresponding to one language. Thus both the context of the entity mention and the document of the cand idate entity can be represented in a space using the same semantic topics. We use these topics to do cross lingual entity linking. Experimental results show that the proposed approach can obtain the competitive results compared with the state-of-art approach. 0 0
Cross-media topic mining on wikipedia Cross-media topic mining on wikipedia Xiaolong Wang
Yuanyuan Liu
Dingquan Wang
Fei Wu
MM 2013 - Proceedings of the 2013 ACM Multimedia Conference English 2013 As a collaborative wiki-based encyclopedia, Wikipedia pro- vides a huge amount of articles of various categories. In addition to their text corpus, Wikipedia also contains plenty of images which makes the articles more intuitive for readers to understand. To better organize these visual and textual data, one promising area of research is to jointly model the embedding topics across multi-modal data (i.e, cross-media) from Wikipedia. In this work, we propose to learn the projection matrices that map the data from heterogeneous feature spaces into a unified latent topic space. Different from previous approaches, by imposing the ℓ1 regularizers to the projection matrices, only a small number of relevant visual/textual words are associated with each topic, which makes our model more interpretable and robust. Further- more, the correlations of Wikipedia data in different modalities are explicitly considered in our model. The effectiveness of the proposed topic extraction algorithm is verified by several experiments conducted on real Wikipedia datasets. Copyright 0 0
DFT-extractor: A system to extract domain-specific faceted taxonomies from wikipedia DFT-extractor: A system to extract domain-specific faceted taxonomies from wikipedia Wei B.
Liu J.
Jun Ma
Zheng Q.
Weinan Zhang
Feng B.
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 Extracting faceted taxonomies from the Web has received increasing attention in recent years from the web mining community. We demonstrate in this study a novel system called DFT-Extractor, which automatically constructs domain-specific faceted taxonomies from Wikipedia in three steps: 1) It crawls domain terms from Wikipedia by using a modified topical crawler. 2) Then it exploits a classification model to extract hyponym relations with the use of motif-based features. 3) Finally, it constructs a faceted taxonomy by applying a community detection algorithm and a group of heuristic rules. DFT-Extractor also provides a graphical user interface to visualize the learned hyponym relations and the tree structure of taxonomies. 0 0
Decentering Design: Wikipedia and Indigenous Knowledge Decentering Design: Wikipedia and Indigenous Knowledge Maja van der Velden International Journal of Human-Computer Interaction English 2013 This article is a reflection on the case of Wikipedia, the largest online reference site with 23 million articles, with 365 million readers, and without a page called Indigenous knowledge. A Postcolonial Computing lens, extended with the notion of decentering, is used to find out what happened with Indigenous knowledge in Wikipedia. Wikipedia's ordering technologies, such as policies and templates, play a central role in producing knowledge. Two designs, developed with and for Indigenous communities, are introduced to explore if another Wikipedia's design is possible. 0 0
Defining, Understanding, and Supporting Open Collaboration: Lessons From the Literature Defining, Understanding, and Supporting Open Collaboration: Lessons From the Literature Andrea Forte
Cliff Lampe
American Behavioral Scientist English 2013 In this short introductory piece, we define open collaboration and contextualize the diverse articles in this special issue in a common vocabulary and history. We provide a definition of open collaboration and situate the phenomenon within an interrelated set of scholarly and ideological movements. We then examine the properties of open collaboration systems that have given rise to research and review major areas of scholarship. We close with a summary of consistent findings in open collaboration research to date. 0 0
Designing a chat-bot that simulates an historical figure Designing a chat-bot that simulates an historical figure Haller E.
Rebedea T.
Proceedings - 19th International Conference on Control Systems and Computer Science, CSCS 2013 English 2013 There are many applications that are incorporating a human appearance and intending to simulate human dialog, but in most of the cases the knowledge of the conversational bot is stored in a database created by a human experts. However, very few researches have investigated the idea of creating a chat-bot with an artificial character and personality starting from web pages or plain text about a certain person. This paper describes an approach to the idea of identifying the most important facts in texts describing the life (including the personality) of an historical figure for building a conversational agent that could be used in middle-school CSCL scenarios. 0 0
Detecting collaboration from behavior Detecting collaboration from behavior Bauer T.
Garcia D.
Colbaugh R.
Glass K.
IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics English 2013 This paper describes a method for inferring when a person might be coordinating with others based on their behavior. We show that, in Wikipedia, editing behavior is more random when coordinating with others. We analyzed this using both entropy and conditional entropy. These algorithms rely only on timestamped events associated with entities, making them broadly applicable to other domains. In this paper, we will discuss previous research on this topic, how we adapted that research to the problem ofWikipedia edit behavior, describe how we extended it, and discuss our results. 0 0
Detection of article qualities in the chinese wikipedia based on c4.5 decision tree Detection of article qualities in the chinese wikipedia based on c4.5 decision tree Xiao K.
Li B.
He P.
Yang X.-H.
Lecture Notes in Computer Science English 2013 The number of articles in Wikipedia is growing rapidly. It is important for Wikipedia to provide users with high quality and reliable articles. However, the quality assessment metric provided by Wikipedia are inefficient, and other mainstream quality detection methods only focus on the qualities of the English Wikipedia articles, and usually analyze the text contents of articles, which is also a time-consuming process. In this paper, we propose a method for detecting the article qualities of the Chinese Wikipedia based on C4.5 decision tree. The problem of quality detection is transformed to classification problem of high-quality and low-quality articles. By using the fields from the tables in the Chinese Wikipedia database, we built the decision trees to distinguish high-quality articles from low-quality ones. 0 0
Determinants of collective intelligence quality: Comparison between Wiki and Q&A services in English and Korean users Determinants of collective intelligence quality: Comparison between Wiki and Q&A services in English and Korean users Joo J.
Normatov I.
Service Business English 2013 Although web-enabled collective intelligence (CI) plays a critical role in organizational innovation and collaboration, the dubious quality of CI is still a substantial problem faced by many CI services. Thus, it is important to identify determinants of CI quality and to analyze the relationship between CI quality and its usefulness. One of the most successful services of web-enabled CI is Wikipedia accessible all over the world. Another type of CI service is Naver KnowledgeiN, a typical and popular CI site offering question and answer (Q&A) services in Korea. Wikipedia is a multilingual and web-based encyclopedia. Thus, it is necessary to study the influence relationships among CI quality, its determinants, and CI usefulness according to different CI type and languages. In this paper, we propose a new research model reflecting multi-dimensional factors related to CI quality from user's perspective. To test a total of 15 hypotheses drawn from the research model, a total of 691 responses were collected from Wikipedia and KnowledgeiN users in South Korea and US. Expertise of contributors, community size, and diversity of contributors were identified as determinants of perceived CI quality. Perceived CI quality has significantly influenced on perceived CI usefulness from user's perspective. CI type and different language partially play a role of moderators. The expertise of contributors plays a more important role in CI quality in the case of Q&A services such as KnowledgeiN compared to Wiki services such as Wikipedia. This implies that Q&A service requires more expertise and experiences in particular areas rather than the case of Wiki service to improve service quality. The relationship between community size and perceived CI quality was different according to CI type. The community size has a greater effect on CI quality in case of Wiki service than that of Q&A service. The number of contributors in Wikipedia is important because Wiki is an encyclopedia service which is edited and revised repeatedly from many contributors while the answer given in Naver KnowledgeiN cannot be edited by others. Finally, CI quality has a greater effect on its usefulness in case of Wiki service rather than Q&A service. In this paper, we suggested implications for practitioners and theorists. 0 0
Determining leadership in contentious discussions Determining leadership in contentious discussions Jain S.
Hovy E.
Electronic Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2013 English 2013 Participants in online decision making environments assume different roles. Especially in contentious discussions, the outcome often depends critically on the discussion leader(s). Recent work on automated leadership analysis has focused on collaborations where all the participants have the same goal. In this paper we focus on contentious discussions, in which the participants have different goals based on their opinion, which makes the notion of leader very different. We analyze discussions on the Wikipedia Articles for Deletion (AfD) forum. We define two complimentary models, Content Leader and SilentOut Leader. The models quantify the basic leadership qualities of participants and assign leadership points to them. We compare the correlation between the leaders' rank produced by the two models using the Spearman Coefficient. We also propose a method to verify the quality of the leaders identified by each model. 0 0
Determining relation semantics by mapping relation phrases to knowledge base Determining relation semantics by mapping relation phrases to knowledge base Liu F.
Yuanyuan Liu
Guangyou Zhou
Kang Liu
Jun Zhao
Proceedings - 2nd IAPR Asian Conference on Pattern Recognition, ACPR 2013 English 2013 0 0
Development and evaluation of an ensemble resource linking medications to their indications Development and evaluation of an ensemble resource linking medications to their indications Wei W.-Q.
Cronin R.M.
Xu H.
Lasko T.A.
Bastarache L.
Denny J.C.
Journal of the American Medical Informatics Association English 2013 Objective: To create a computable MEDication Indication resource (MEDI) to support primary and secondary use of electronic medical records (EMRs). Materials and methods: We processed four public medication resources, RxNorm, Side Effect Resource (SIDER) 2, MedlinePlus, and Wikipedia, to create MEDI. We applied natural language processing and ontology relationships to extract indications for prescribable, single-ingredient medication concepts and all ingredient concepts as defined by RxNorm. Indications were coded as Unified Medical Language System (UMLS) concepts and International Classification of Diseases, 9th edition (ICD9) codes. A total of 689 extracted indications were randomly selected for manual review for accuracy using dual-physician review. We identified a subset of medication-indication pairs that optimizes recall while maintaining high precision. Results: MEDI contains 3112 medications and 63 343 medication-indication pairs. Wikipedia was the largest resource, with 2608 medications and 34 911 pairs. For each resource, estimated precision and recall, respectively, were 94% and 20% for RxNorm, 75% and 33% for MedlinePlus, 67% and 31% for SIDER 2, and 56% and 51% for Wikipedia. The MEDI high-precision subset (MEDI-HPS) includes indications found within either RxNorm or at least two of the three other resources. MEDI-HPS contains 13 304 unique indication pairs regarding 2136 medications. The mean±SD number of indications for each medication in MEDI-HPS is 6.22±6.09. The estimated precision of MEDI-HPS is 92%. Conclusions: MEDI is a publicly available, computable resource that links medications with their indications as represented by concepts and billing codes. MEDI may benefit clinical EMR applications and reuse of EMR data for research. 0 0
Discovering missing semantic relations between entities in Wikipedia Discovering missing semantic relations between entities in Wikipedia Xu M.
Zhe Wang
Bie R.
Jing-Woei Li
Zheng C.
Ke W.
Zhou M.
Lecture Notes in Computer Science English 2013 Wikipedia's infoboxes contain rich structured information of various entities, which have been explored by the DBpedia project to generate large scale Linked Data sets. Among all the infobox attributes, those attributes having hyperlinks in its values identify semantic relations between entities, which are important for creating RDF links between DBpedia's instances. However, quite a few hyperlinks have not been anotated by editors in infoboxes, which causes lots of relations between entities being missing in Wikipedia. In this paper, we propose an approach for automatically discovering the missing entity links in Wikipedia's infoboxes, so that the missing semantic relations between entities can be established. Our approach first identifies entity mentions in the given infoboxes, and then computes several features to estimate the possibilities that a given attribute value might link to a candidate entity. A learning model is used to obtain the weights of different features, and predict the destination entity for each attribute value. We evaluated our approach on the English Wikipedia data, the experimental results show that our approach can effectively find the missing relations between entities, and it significantly outperforms the baseline methods in terms of both precision and recall. 0 0
Discovering unexpected information on the basis of popularity/unpopularity analysis of coordinate objects and their relationships Discovering unexpected information on the basis of popularity/unpopularity analysis of coordinate objects and their relationships Tsukuda K.
Hiroaki Ohshima
Michihiro Yamamoto
Hirotoshi Iwasaki
Katsumi Tanaka
Proceedings of the ACM Symposium on Applied Computing English 2013 Although many studies have addressed the problem of finding Web pages seeking relevant and popular information from a query, very few have focused on the discovery of unexpected information. This paper provides and evaluates methods for discovering unexpected information for a keyword query. For example, if the user inputs "Michael Jackson," our system first discovers the unexpected related term "karate" and then returns the unexpected information "Michael Jackson is good at karate." Discovering unexpected information is useful in many situations. For example, when a user is browsing a news article on the Web, unexpected information about a person associated with the article can pique the user's interest. If a user is sightseeing or driving, providing unexpected, additional information about a building or the region is also useful. Our approach collects terms related to a keyword query and evaluates the degree of unexpectedness of each related term for the query on the basis of (i) the relationships of coordinate terms of both the keyword query and related terms, and (ii) the degree of popularity of each related term. Experimental results show that considering these two factors are effective for discovering unexpected information. Copyright 2013 ACM. 0 0
Distant supervision learning of DBPedia relations Distant supervision learning of DBPedia relations Zajac M.
Przepiorkowski A.
Lecture Notes in Computer Science English 2013 This paper presents DBPediaExtender, an information extraction system that aims at extending an existing ontology of geographical entities by extracting information from text. The system uses distant supervision learning - the training data is constructed on the basis of matches between values from infoboxes (taken from the Polish DBPedia) and Wikipedia articles. For every relevant relation, a sentence classifier and a value extractor are trained; the sentence classifier selects sentences expressing a given relation and the value extractor extracts values from selected sentences. The results of manual evaluation for several selected relations are reported. 0 0
Diversifying Query Suggestions by using Topics from Wikipedia Diversifying Query Suggestions by using Topics from Wikipedia Hu H.
Maoyuan Zhang
He Z.
Pu Wang
Weiping Wang
Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013 English 2013 Diversifying query suggestions has emerged recently, by which the recommended queries can be both relevant and diverse. Most existing works diversify suggestions by query log analysis, however, for structured data, not all query logs are available. To this end, this paper studies the problem of suggesting diverse query terms by using topics from Wikipedia. Wikipedia is a successful online encyclopedia, and has high coverage of entities and concepts. We first obtain all relevant topics from Wikipedia, and then map each term to these topics. As the mapping is a nontrivial task, we leverage information from both Wikipedia and structured data to semantically map each term to topics. Finally, we propose a fast algorithm to efficiently generate the suggestions. Extensive evaluations are conducted on a real dataset, and our approach yields promising results. 0 0
Document analytics through entity resolution Document analytics through entity resolution Santos J.
Martins B.
Batista D.S.
Lecture Notes in Computer Science English 2013 We present a prototype system for resolving named entities, mentioned in textual documents, into the corresponding Wikipedia entities. This prototype can aid in document analysis, by using the disambiguated references to provide useful information in context. 0 0
Document listing on versioned documents Document listing on versioned documents Claude F.
Munro J.I.
Lecture Notes in Computer Science English 2013 Representing versioned documents, such as Wikipedia history, web archives, genome databases, backups, is challenging when we want to support searching for an exact substring and retrieve the documents that contain the substring. This problem is called document listing. We present an index for the document listing problem on versioned documents. Our index is the first one based on grammar-compression. This allows for good results on repetitive collections, whereas standard techniques cannot achieve competitive space for solving the same problem. Our index can also be addapted to work in a more standard way, allowing users to search for word-based phrase queries and conjunctive queries at the same time. Finally, we discuss extensions that may be possible in the future, for example, supporting ranking capabilities within the index itself. 0 0
Does formal authority still matter in the age of wisdom of crowds: Perceived credibility, peer and professor endorsement in relation to college students' wikipedia use for academic purposes Does formal authority still matter in the age of wisdom of crowds: Perceived credibility, peer and professor endorsement in relation to college students' wikipedia use for academic purposes Sook Lim Proceedings of the ASIST Annual Meeting English 2013 This study explores whether or not formal authority still matters for college students using Wikipedia by examining the variables of individual perceived credibility, peer endorsement and professor endorsement in relation to students' academic use of Wikipedia. A web survey was used to collected data in fall 2011. A total of 142 students participated in the study, of which a total of 123 surveys were useable for this study. The findings show that the more professors approved of Wikipedia, the more students used it for academic purposes. In addition, the more students perceived Wikipedia as credible, the more they used it for academic purposes. The results indicate that formal authority still influences students' use of usergenerated content (UGC) in their formal domain, academic work. The results can be applicable to other UGC, which calls attention to educators' active intervention to appropriate academic use of UGC. Professors' guidelines for UGC would benefit students. 0 0
Dynamic information retrieval using, constructing concepts maps with SW principles Dynamic information retrieval using, constructing concepts maps with SW principles Nalini T. Middle - East Journal of Scientific Research English 2013 Concept Maps are the straightforward way to keep in mind about a topic, visual image is that the major half that's being centered here. This paper makes an attempt to demonstrate the thought map of a Wikipedia page. The highlight of the work is to style of associate degree formula that retrieves the data dynamically from the Wikipedia page and Concept maps ar drawn by considering the principles of visual notations in software system Engineering. This method is enforced in such some way that a mobile that incorporates a little screen through that ton of content can't be scan however will be viewed as a concept map and therefore the sub-topics of the content are shown as its branches and this branches also can be developed as a brand new thought map for that specific word as per user's would like. 0 0
E-learning and the Quality of Knowledge in a Globalized World E-learning and the Quality of Knowledge in a Globalized World Van De Bunt-Kokhuis S. Distance and E-Learning in Transition: Learning Innovation, Technology and Social Challenges English 2013 [No abstract available] 0 0
Effcient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization Effcient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization Gran A.
Bayazit N.G.
Grbz M.Z.
Turkish Journal of Electrical Engineering and Computer Sciences English 2013 This study presents a novel hybrid Turkish text summarization system that combines structural and semantic features. The system uses 5 structural features, 1 of which is newly proposed and 3 are semantic features whose values are extracted from Turkish Wikipedia links. The features are combined using the weights calculated by 2 novel approaches. The first approach makes use of an analytical hierarchical process, which depends on a series of expert judgments based on pairwise comparisons of the features. The second approach makes use of the artificial bee colony algorithm for automatically determining the weights of the features. To confirm the significance of the proposed hybrid system, its performance is evaluated on a new Turkish corpus that contains 110 documents and 3 human-generated extractive summary corpora. The experimental results show that exploiting all of the features by combining them results in a better performance than exploiting each feature individually. 0 0
Effectiveness of shared leadership in Wikipedia Effectiveness of shared leadership in Wikipedia Haiping Zhu
Kraut R.E.
Aniket Kittur
Human Factors English 2013 Objective: The objective of the paper is to understand leadership in an online community, specifically, Wikipedia. Background: Wikipedia successfully aggregates millions of volunteers' efforts to create the largest encyclopedia in human history. Without formal employment contracts and monetary incentives, one significant question for Wikipedia is how it organizes individual members with differing goals, experience, and commitment to achieve a collective outcome. Rather than focusing on the role of the small set of people occupying a core leadership position, we propose a shared leadership model to explain the leadership in Wikipedia. Members mutually influence one another by exercising leadership behaviors, including rewarding, regulating, directing, and socializing one another. Method: We conducted a two-phase study to investigate how distinct types of leadership behaviors (transactional, aversive, directive, and person-focused), the legitimacy of the people who deliver the leadership, and the experience of the people who receive the leadership influence the effectiveness of shared leadership in Wikipedia. Results: Our results highlight the importance of shared leadership in Wikipedia and identify trade-offs in the effectiveness of different types of leadership behaviors. Aversive and directive leadership increased contribution to the focal task, whereas transactional and person-focused leadership increased general motivation. We also found important differences in how newcomers and experienced members responded to leadership behaviors from peers. Application: These findings extend shared leadership theories, contribute new insight into the important underlying mechanisms in Wikipedia, and have implications for practitioners who wish to design more effective and successful online communities. Copyright 0 0
Effects of implicit positive ratings for quality assessment of Wikipedia articles Effects of implicit positive ratings for quality assessment of Wikipedia articles Yu Suzuki Journal of Information Processing English 2013 In this paper, we propose a method to identify high-quality Wikipedia articles by using implicit positive ratings. One of the major approaches for assessing Wikipedia articles is a text survival ratio based approach. In this approach, when a text survives beyond multiple edits, the text is assessed as high quality. However, the problem is that many low quality articles are misjudged as high quality, because every editor does not always read the whole article. If there is a low quality text at the bottom of a long article, and the text has not seen by the other editors, then the text survives beyond many edits, and the text is assessed as high quality. To solve this problem, we use a section and a paragraph as a unit instead of a whole page. In our method, if an editor edits an article, the system considers that the editor gives positive ratings to the section or the paragraph that the editor edits. From experimental evaluation, we confirmed that the proposed method could improve the accuracy of quality values for articles. 0 0
Effects of peer feedback on contribution: A field experiment in Wikipedia Effects of peer feedback on contribution: A field experiment in Wikipedia Haiping Zhu
Zhang A.
He J.
Kraut R.E.
Aniket Kittur
Conference on Human Factors in Computing Systems - Proceedings English 2013 One of the most significant challenges for many online communities is increasing members' contributions over time. Prior studies on peer feedback in online communities have suggested its impact on contribution, but have been limited by their correlational nature. In this paper, we conducted a field experiment on Wikipedia to test the effects of different feedback types (positive feedback, negative feedback, directive feedback, and social feedback) on members' contribution. Our results characterize the effects of different feedback types, and suggest trade-offs in the effects of feedback between the focal task and general motivation, as well as differences in how newcomers and experienced editors respond to peer feedback. This research provides insights into the mechanisms underlying peer feedback in online communities and practical guidance to design more effective peer feedback systems. Copyright 0 0
Encoding local correspondence in topic models Encoding local correspondence in topic models Mehdi R.E.
Mohamed Q.
Mustapha A.
Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI English 2013 Exploiting label correlations is a challenging and crucial problem especially in multi-label learning context. Labels correlations are not necessarily shared by all instances and have generally a local definition. This paper introduces LOC-LDA, which is a latent variable model that adresses the problem of modeling annotated data by locally exploiting correlations between annotations. In particular, we represent explicitly local dependencies to define the correspondence between specific objects, i.e. regions of images and their annotations. We conducted experiments on a collection of pictures provided by the Wikipedia 'Picture of the day' website, and evaluated our model on the task of 'automatic image annotation'. The results validate the effectiveness of our approach. 0 0
English nominal compound detection with Wikipedia-based methods English nominal compound detection with Wikipedia-based methods Nagy T. I.
Veronika Vincze
Lecture Notes in Computer Science English 2013 Nominal compounds (NCs) are lexical units that consist of two or more elements that exist on their own, function as a noun and have a special added meaning. Here, we present the results of our experiments on how the growth of Wikipedia added to the performance of our dictionary labeling methods to detecting NCs. We also investigated how the size of an automatically generated silver standard corpus can affect the performance of our machine learning-based method. The results we obtained demonstrate that the bigger the dataset, the better the performance will be. 0 0
Enseigner la révision à l'ère des wikis ou là où on trouve la technologie alors qu'on ne l'attendait pas Enseigner la révision à l'ère des wikis ou là où on trouve la technologie alors qu'on ne l'attendait pas Brunette
Louise et Gagnon
Chantal
JoSTrans, , no 1 2013 In academic teaching, there are very few experiences on collaborative wiki revision. In a Quebec university, we experimented upon a wiki revision activity with translation students in their third final year. We specifically chose to revise texts in Wikipedia because its environment shares similarities with the labor market in the language industry and because we believed that the wiki allowed us to achieve the overall objectives of the revision class, such as we define them. Throughout the experience, we monitored the progress of students’ revision interventions on Wikipedia texts as well as exchanges taking place between revisees and reviewers. All our research observations were made possible by the convoluted but systematic structure in Wikipedia. Here, we report on the experiment at the Université du Québec en Outaouais and let our academic teaching readers decide whether the exercise is right for them. For us, it was convincing. RÉSUMÉ Dans l’enseignement universitaire, on dénombre très peu d’expériences de révision sur des wikis. Dans une université du Québec, nous nous sommes lancées dans une activité de révision wiki avec des étudiants de traduction en classe de terminale, soit en troisième année. Nous avons opté pour la révision d’un texte de Wikipédia en raison, entre autres, des similitudes de l’expérience avec le marché du travail et parce que nous croyions que le wiki assurait l’atteinte des objectifs généraux des cours de révision, tels que nous les définissons. Tout au cours de l’exercice, nous avons surveillé le progrès des révisions, les interventions des étudiants sur les textes de même que les échanges entre révisés et réviseurs. Toutes ces observations sont rendues possibles par la structure, alambiquée, mais systématique de Wikipédia. Nous livrons nos réflexions sur l’expérience menée à l’Université du Québec en Outaouais et laissons à nos lecteurs enseignants le soin de décider si l’exercice leur convient. Pour nous, il a été convaincant. 0 0
Entityclassifier.eu: Real-time classification of entities in text with Wikipedia Entityclassifier.eu: Real-time classification of entities in text with Wikipedia Dojchinovski M.
Kliegr T.
Lecture Notes in Computer Science English 2013 Targeted Hypernym Discovery (THD) performs unsupervised classification of entities appearing in text. A hypernym mined from the free-text of the Wikipedia article describing the entity is used as a class. The type as well as the entity are cross-linked with their representation in DBpedia, and enriched with additional types from DBpedia and YAGO knowledge bases providing a semantic web interoperability. The system, available as a web application and web service at entityclassifier.eu , currently supports English, German and Dutch. 0 0
Escaping the trap of too precise topic queries Escaping the trap of too precise topic queries Libbrecht P. Lecture Notes in Computer Science English 2013 At the very center of digital mathematics libraries lie controlled vocabularies which qualify the topic of the documents. These topics are used when submitting a document to a digital mathematics library and to perform searches in a library. The latter are refined by the use of these topics as they allow a precise classification of the mathematics area this document addresses. However, there is a major risk that users employ too precise topics to specify their queries: they may be employing a topic that is only "close-by" but missing to match the right resource. We call this the topic trap. Indeed, since 2009, this issue has appeared frequently on the i2geo.net platform. Other mathematics portals experience the same phenomenon. An approach to solve this issue is to introduce tolerance in the way queries are understood by the user. In particular, the approach of including fuzzy matches but this introduces noise which may prevent the user of understanding the function of the search engine. In this paper, we propose a way to escape the topic trap by employing the navigation between related topics and the count of search results for each topic. This supports the user in that search for close-by topics is a click away from a previous search. This approach was realized with the i2geo search engine and is described in detail where the relation of being related is computed by employing textual analysis of the definitions of the concepts fetched from the Wikipedia encyclopedia. 0 0
Evaluating article quality and editor reputation in Wikipedia Evaluating article quality and editor reputation in Wikipedia Lu Y.
Lei Zhang
Jing-Woei Li
Communications in Computer and Information Science English 2013 We study a novel problem of quality and reputation evaluation for Wikipedia articles. We propose a difficult and interesting question: How to generate reasonable article quality score and editor reputation in a framework at the same time? In this paper, We propose a dual wing factor graph(DWFG) model, which utilizes the mutual reinforcement between articles and editors to generate article quality and editor reputation. To learn the proposed factor graph model, we further design an efficient algorithm. We conduct experiments to validate the effectiveness of the proposed model. By leveraging the belief propagation between articles and editors, our approach obtains significant improvement over several alternative methods(SVM, LR, PR, CRF). 0 0
Evaluating entity linking with wikipedia Evaluating entity linking with wikipedia Ben Hachey
Will Radford
Joel Nothman
Matthew Honnibal
Curran J.R.
Artificial Intelligence English 2013 Named Entity Linking (nel) grounds entity mentions to their corresponding node in a Knowledge Base (kb). Recently, a number of systems have been proposed for linking entity mentions in text to Wikipedia pages. Such systems typically search for candidate entities and then disambiguate them, returning either the best candidate or nil. However, comparison has focused on disambiguation accuracy, making it difficult to determine how search impacts performance. Furthermore, important approaches from the literature have not been systematically compared on standard data sets. We reimplement three seminal nel systems and present a detailed evaluation of search strategies. Our experiments find that coreference and acronym handling lead to substantial improvement, and search strategies account for much of the variation between systems. This is an interesting finding, because these aspects of the problem have often been neglected in the literature, which has focused largely on complex candidate ranking algorithms. © 2012 Elsevier B.V. All rights reserved. 0 0
Evaluation of ILP-based approaches for partitioning into colorful components Evaluation of ILP-based approaches for partitioning into colorful components Bruckner S.
Huffner F.
Komusiewicz C.
Niedermeier R.
Lecture Notes in Computer Science English 2013 The NP-hard Colorful Components problem is a graph partitioning problem on vertex-colored graphs. We identify a new application of Colorful Components in the correction of Wikipedia interlanguage links, and describe and compare three exact and two heuristic approaches. In particular, we devise two ILP formulations, one based on Hitting Set and one based on Clique Partition. Furthermore, we use the recently proposed implicit hitting set framework [Karp, JCSS 2011; Chandrasekaran et al., SODA 2011] to solve Colorful Components. Finally, we study a move-based and a merge-based heuristic for Colorful Components. We can optimally solve Colorful Components for Wikipedia link correction data; while the Clique Partition-based ILP outperforms the other two exact approaches, the implicit hitting set is a simple and competitive alternative. The merge-based heuristic is very accurate and outperforms the move-based one. The above results for Wikipedia data are confirmed by experiments with synthetic instances. 0 0
Evaluation of WikiTalk - User studies of human-robot interaction Evaluation of WikiTalk - User studies of human-robot interaction Anastasiou D.
Kristiina Jokinen
Graham Wilcock
Lecture Notes in Computer Science English 2013 The paper concerns the evaluation of Nao WikiTalk, an application that enables a Nao robot to serve as a spoken open-domain knowledge access system. With Nao WikiTalk the robot can talk about any topic the user is interested in, using Wikipedia as its knowledge source. The robot suggests some topics to start with, and the user shifts to related topics by speaking their names after the robot mentions them. The user can also switch to a totally new topic by spelling the first few letters. As well as speaking, the robot uses gestures, nods and other multimodal signals to enable clear and rich interaction. The paper describes the setup of the user studies and reports on the evaluation of the application, based on various factors reported by the 12 users who participated. The study compared the users' expectations of the robot interaction with their actual experience of the interaction. We found that the users were impressed by the lively appearance and natural gesturing of the robot, although in many respects they had higher expectations regarding the robot's presentation capabilities. However, the results are positive enough to encourage research on these lines. 0 0
Evaluation of named entity recognition tools on microposts Evaluation of named entity recognition tools on microposts Dlugolinsky S.
Marek Ciglan
Laclavik M.
INES 2013 - IEEE 17th International Conference on Intelligent Engineering Systems, Proceedings English 2013 In this paper we evaluate eight well-known Information Extraction (IE) tools on a task of Named Entity Recognition (NER) in microposts. We have chosen six NLP tools and two Wikipedia concept extractors for the evaluation. Our intent was to see how these tools would perform on relatively short texts of microposts. Evaluation dataset has been adopted from the MSM 2013 IE Challenge. This dataset contained manually annotated microposts with classification restricted to four entity types: PER, LOC, ORG and MISC. 0 0
Evolution of peer production system based on limited matching and preferential selection Evolution of peer production system based on limited matching and preferential selection Li X.
Li S.-W.
Shanghai Ligong Daxue Xuebao/Journal of University of Shanghai for Science and Technology Chinese 2013 Based on the real background of Wikipedia adopted as a classic peer production system and many users taking part in its editing, the two characteristics of preferential selection and limited matching during the editing process were considered. Two rules for " preferential selection" and " limited matching" and the evolving model of peer production system were presented. The analysis was based on computational experiments on the times of page editing, the status variation of pages and users, the affection of matching degree on page editing times, etc. The computational experiments show that the Wikipedia system evolves to a stable status under the action of the two rules. In the stable status, the times of page editing follow power-law distribution; the difference between user's status and page status(i. e. the matching degree)is toward to zero; the larger the matching degree of user and page, the smaller the power index of power-law distribution, so the longer the tail of power-law distribution. 0 0
Exploiting the Arabic Wikipedia for semi-automatic construction of a lexical ontology Exploiting the Arabic Wikipedia for semi-automatic construction of a lexical ontology Boudabous M.M.
Belguith L.H.
Sadat F.
International Journal of Metadata, Semantics and Ontologies English 2013 In this paper, we propose a hybrid (numerical/linguistic) method to build a lexical ontology for the Arabic language. This method is based on the Arabic Wikipedia. It consists of two phases: analysing the description section in order to build core ontology and then using the physical structure of Wikipedia articles (info-boxes, category pages and redirect links) and their contents for enriching the core ontology. The building phase of the core ontology is implemented via the TBAO system. The obtained core ontology contains more than 200,000 concepts. Copyright 0 0
Exploiting the category structure of Wikipedia for entity ranking Exploiting the category structure of Wikipedia for entity ranking Rianne Kaptein
Jaap Kamps
Artificial Intelligence English 2013 The Web has not only grown in size, but also changed its character, due to collaborative content creation and an increasing amount of structure. Current Search Engines find Web pages rather than information or knowledge, and leave it to the searchers to locate the sought information within the Web page. A considerable fraction of Web searches contains named entities. We focus on how the Wikipedia structure can help rank relevant entities directly in response to a search request, rather than retrieve an unorganized list of Web pages with relevant but also potentially redundant information about these entities. Our results demonstrate the benefits of using topical and link structure over the use of shallow statistics. Our main findings are the following. First, we examine whether Wikipedia category and link structure can be used to retrieve entities inside Wikipedia as is the goal of the INEX (Initiative for the Evaluation of XML retrieval) Entity Ranking task. Category information proves to be a highly effective source of information, leading to large and significant improvements in retrieval performance on all data sets. Secondly, we study how we can use category information to retrieve documents for ad hoc retrieval topics in Wikipedia. We study the differences between entity ranking and ad hoc retrieval in Wikipedia by analyzing the relevance assessments. Considering retrieval performance, also on ad hoc retrieval topics we achieve significantly better results by exploiting the category information. Finally, we examine whether we can automatically assign target categories to ad hoc and entity ranking queries. Guessed categories lead to performance improvements that are not as large as when the categories are assigned manually, but they are still significant. We conclude that the category information in Wikipedia is a useful source of information that can be used for entity ranking as well as other retrieval tasks. © 2012 Elsevier B.V. All rights reserved. 0 0
Exploring the Cautionary Attitude Toward Wikipedia in Higher Education: Implications for Higher Education Institutions Exploring the Cautionary Attitude Toward Wikipedia in Higher Education: Implications for Higher Education Institutions Bayliss G. New Review of Academic Librarianship English 2013 This article presents the research findings of a small-scale study which aimed to explore the cautionary attitude toward the use of Wikipedia in the process of learning. A qualitative case study approach was taken, using literature review, institutional documentation, and semi-structured interviews with five members of academic teaching staff from a UK Business School. Analysis found the reasons for the cautionary attitude were due to a lack of understanding of Wikipedia, a negative attitude toward collaborative knowledge produced outside academia, and the perceived detrimental effects of the use of Web 2.0 applications not included in the university suite. 0 0
Extending BCDM to cope with proposals and evaluations of updates Extending BCDM to cope with proposals and evaluations of updates Anselma L.
Bottrighi A.
Montani S.
Terenziani P.
IEEE Transactions on Knowledge and Data Engineering English 2013 The cooperative construction of data/knowledge bases has recently had a significant impulse (see, e.g., Wikipedia [1]). In cases in which data/knowledge quality and reliability are crucial, proposals of update/insertion/deletion need to be evaluated by experts. To the best of our knowledge, no theoretical framework has been devised to model the semantics of update proposal/evaluation in the relational context. Since time is an intrinsic part of most domains (as well as of the proposal/evaluation process itself), semantic approaches to temporal relational databases (specifically, Bitemporal Conceptual Data Model (henceforth, BCDM) [2]) are the starting point of our approach. In this paper, we propose BCDMPV, a semantic temporal relational model that extends BCDM to deal with multiple update/insertion/deletion proposals and with acceptances/rejections of proposals themselves. We propose a theoretical framework, defining the new data structures, manipulation operations and temporal relational algebra and proving some basic properties, namely that BCDMPV is a consistent extension of BCDM and that it is reducible to BCDM. These properties ensure consistency with most relational temporal database frameworks, facilitating implementations. 0 0
Extracting PROV provenance traces from Wikipedia history pages Extracting PROV provenance traces from Wikipedia history pages Missier P.
Zheng Chen
ACM International Conference Proceeding Series English 2013 Wikipedia History pages contain provenance metadata that describes the history of revisions of each Wikipedia article. We have developed a simple extractor which, starting from a user-specified article page, crawls through the graph of its associated history pages, and encodes the essential elements of those pages according to the PROV data model. The crawling is performed on the live pages using the Wikipedia REST interface. The resulting PROV provenance graphs are stored in a graph database (Neo4J), where they can be queried using the Cypher graph query language (proprietary to Neo4J), or traversed programmatically using the Neo4J Java Traversal API. 0 0
Extracting complementary information from Wikipedia articles of different languages Extracting complementary information from Wikipedia articles of different languages Akiyo Nadamoto
Fujiwara Y.
Konishi Y.
Yu Suzuki
International Journal of Business Intelligence and Data Mining English 2013 In Wikipedia, users can create and edit information freely. Few editors take responsibility for editing the articles. Therefore, information of many Wikipedia articles is lacking. Furthermore, Wikipedia has different levels of value of its information depending on the language version of the site. In this paper, we propose the extraction of complementary information from different language Wikipedia and its automatic presentation. The important points of our method are: 1) extraction of comparison articles from different language Wikipedia; 2) extraction of complementary information; 3) presentation of complementary information. 0 0
Extracting event-related information from article updates in Wikipedia Extracting event-related information from article updates in Wikipedia Georgescu M.
Kanhabua N.
Krause D.
Wolfgang Nejdl
Siersdorfer S.
Lecture Notes in Computer Science English 2013 Wikipedia is widely considered the largest and most up-to-date online encyclopedia, with its content being continuously maintained by a supporting community. In many cases, real-life events like new scientific findings, resignations, deaths, or catastrophes serve as triggers for collaborative editing of articles about affected entities such as persons or countries. In this paper, we conduct an in-depth analysis of event-related updates in Wikipedia by examining different indicators for events including language, meta annotations, and update bursts. We then study how these indicators can be employed for automatically detecting event-related updates. Our experiments on event extraction, clustering, and summarization show promising results towards generating entity-specific news tickers and timelines. 0 0
Extracting protein terminologies in literatures Extracting protein terminologies in literatures Gim J.
Kim D.J.
Myunggwon Hwang
Song S.-K.
Jeong D.-H.
Hanmin Jung
Proceedings - 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, GreenCom-iThings-CPSCom 2013 English 2013 Recently, key terminologies in literatures play an important role in analyzing and predicting research trends. Extracting those terminologies therefore used in the papers of researchers' has become the most major issue in a variety of fields. To extract those terminologies, dictionary-based approach that contains terminologies has been applied. Wikipedia also can be considered as a dictionary since Wikipedia has abundant terminologies and power of the collective intelligence. It means that the terminologies are continuously modified and extended every day. Thus it could be an answer set to compare with the terminologies in literatures. However, it hardly extracts terminologies that are newly defined and coined by researchers. In order to solve this issue, we propose a method to derive a set of terminology candidates by comparing terminologies in literatures and Wikipedia. The candidate set extracted from the method showed an accuracy of about 64.33%, which is a good result as an initial study. 0 0
Extracting term relationships from Wikipedia Extracting term relationships from Wikipedia Mathiak B.
Pena V.M.M.
Wira-Alam A.
Lecture Notes in Business Information Processing English 2013 When looking at the relationship between two terms, we are not only interested on how much they are related, but how we may explain this relationship to the user. This is an open problem in ontology matching, but also in other tasks, from information retrieval to lexicography. In this paper, we propose a solution based on snippets taken from Wikipedia. These snippets are found by looking for connectors between the two terms, e.g. the terms themselves, but also terms that occur often in both articles or terms that link to both articles. With a user study, we establish that this is particularly useful when dealing with not well known relationships, but well-known concepts. The users were learning more about the relationship and were able to grade it accordingly. On real life data, there are some issues with near synonyms, which are not detected well and terms from different communities, but aside from that we get usable and useful explanations of the term relationships. 0 0
Extracting traffic information from web texts with a D-S evidence theory based approach Extracting traffic information from web texts with a D-S evidence theory based approach Qiu P.
Lu F.
Haisu Zhang
International Conference on Geoinformatics English 2013 Web texts, such as web pages, BBS, or microblogs, usually contain a great amount of real-time traffic information, which can be expected to become an important data source for city traffic collection. However, due to the characteristics of ambiguity and uncertainty in the description of traffic condition with natural language, and the difference of description quality for web texts among various publishers and text types, there may exist much inconsistency, or even contradiction for the traffic condition on similar spatial-temporal contexts. An efficient information fusion process is crucial to take advantage of the mass web sources for real-time traffic collection. In this paper, we propose a traffic state extraction approach from massive web texts based on D-S evidence theory to solve the above problem. Firstly, an evaluation index system for the traffic state information collected from the web texts is built with the help of semantic similarity based on Wikipedia, to eliminate ambiguity. Then, D-S evidence theory is adopted to judge and fuse the extracted traffic state information, with evidence combination and decision, which can solve the problem of uncertainty and difference. An experiment shows that the presented approach can effectively judge the traffic state information contained in massive web texts, and can fully utilize the data from different websites. Meanwhile, the proposed approach is arguably more accurate than the traditional text clustering algorithm. 0 0
Extraction of biographical data from Wikipedia Extraction of biographical data from Wikipedia Viseur R. DATA 2013 - Proceedings of the 2nd International Conference on Data Technologies and Applications English 2013 Using the content of Wikipedia articles is common in academic research. However the practicalities are rarely analysed. Our research focuses on extracting biographical information about personalities from Belgium. Our research is divided into three sections. The first section describes the state of the art for data extraction from Wikipedia. A second section presents the case study about data extraction for biographies of Belgian personalities. Different solutions are discussed and the solution adopted is implemented. In the third section, the quality of the extraction is discussed. Practical recommendations for researchers wishing to use Wikipedia are also proposed on the basis of our case study. 0 0
Extraction of linked data triples from japanese wikipedia text of ukiyo-e painters Extraction of linked data triples from japanese wikipedia text of ukiyo-e painters Kimura F.
Mitsui K.
Maeda A.
Proceedings - 2013 International Conference on Culture and Computing, Culture and Computing 2013 English 2013 DBpedia provides Linked Data extracted from info boxes in Wikipedia articles. Extraction is easier from an infobox than from text because an info box has a fixed-format table to represent structured information. To provide more Linked Data, we propose a method for Linked Data triple extraction from Wikipedia text. In this study, we conducted an experiment to extract Linked Data triples from Wikipedia text of ukiyo-e painters and achieved precision of 0.605. 0 0
Filling the gaps among DBpedia multilingual chapters for question answering Filling the gaps among DBpedia multilingual chapters for question answering Cojan J.
Cabrio E.
Fabien Gandon
Proceedings of the 3rd Annual ACM Web Science Conference, WebSci 2013 English 2013 To publish information extracted from multilingual pages of Wikipedia in a structured way, the Semantic Web community has started an effort of internationalization of DBpe-dia. Multilingual chapters of DBpedia can in fact contain different information with respect to the English version, in particular they provide more specificity on certain topics, or fill information gaps. DBpedia multilingual chapters are well connected through instance interlinking, extracted from Wikipedia. An alignment between properties is also carried out by DBpedia contributors as a mapping from the terms used in Wikipedia to a common ontology, enabling the exploitation of information coming from the multilingual chapters of DBpedia. However, the mapping process is currently incomplete, it is time consuming since it is manually performed, and may lead to the introduction of redundant terms in the ontology, as it becomes difficult to navigate through the existing vocabulary. In this paper we propose an approach to automatically extend the existing alignments, and we integrate it in a question answering system over linked data. We report on experiments carried out applying the QAKiS (Question Answering wiKiframework-based) system on the English and French DBpedia chapters, and we show that the use of such approach broadens its coverage. Copyright 2013 ACM. 0 0
Finding relevant missing references in learning courses Finding relevant missing references in learning courses Siehndel P.
Kawase R.
Hadgu A.T.
Herder E.
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 Reference sites play an increasingly important role in learning processes. Teachers use these sites in order to identify topics that should be covered by a course or a lecture. Learners visit online encyclopedias and dictionaries to find alternative explanations of concepts, to learn more about a topic, or to better understand the context of a concept. Ideally, a course or lecture should cover all key concepts of the topic that it encompasses, but often time constraints prevent complete coverage. In this paper, we propose an approach to identify missing references and key concepts in a corpus of educational lectures. For this purpose, we link concepts in educational material to the organizational and linking structure ofWikipedia. Identifying missing resources enables learners to improve their understanding of a topic, and allows teachers to investigate whether their learning material covers all necessary concepts. 0 0
From Machu-Picchu to "rafting the urubamba river": Anticipating information needs via the entity-query graph From Machu-Picchu to "rafting the urubamba river": Anticipating information needs via the entity-query graph Bordino I.
De Francisci Morales G.
Ingmar Weber
Bonchi F.
WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining English 2013 We study the problem of anticipating user search needs, based on their browsing activity. Given the current web page p that a user is visiting we want to recommend a small and diverse set of search queries that are relevant to the content of p, but also non-obvious and serendipitous. We introduce a novel method that is based on the content of the page visited, rather than on past browsing patterns as in previous literature. Our content-based approach can be used even for previously unseen pages. We represent the topics of a page by the set of Wikipedia entities extracted from it. To obtain useful query suggestions for these entities, we exploit a novel graph model that we call EQGraph (Entity-Query Graph), containing entities, queries, and transitions between entities, between queries, as well as from entities to queries. We perform Personalized PageRank computation on such a graph to expand the set of entities extracted from a page into a richer set of entities, and to associate these entities with relevant query suggestions. We develop an efficient implementation to deal with large graph instances and suggest queries from a large and diverse pool. We perform a user study that shows that our method produces relevant and interesting recommendations, and outperforms an alternative method based on reverse IR. 0 0
Generating web-based corpora for video transcripts categorization Generating web-based corpora for video transcripts categorization Perea-Ortega J.M.
Montejo-Raez A.
Teresa Martin-Valdivia M.
Alfonso Urena-Lopez L.
Expert Systems with Applications English 2013 This paper proposes the use of Internet as a rich source of information in order to generate learning corpora for video transcripts categorization systems. Our main goal in this work has been to study the behavior of different learning corpora generated from the Internet and analyze some of their features. Specifically, Wikipedia, Google and the blogosphere have been employed to generate these learning corpora, using the VideoCLEF 2008 track as the evaluation framework for the different experiments carried out. Based on this evaluation framework, we conclude that the proposed approach is a promising strategy for the video classification task using the transcripts of the videos. The different sizes of the corpora generated could lead to believe that better results are achieved when the corpus size is larger, but we demonstrate that this feature may not always be a reliable indicator of the behavior of the learning corpus. The obtained results show that the integration of knowledge from the blogosphere or Google allows generating more reliable corpora for this task than those based on Wikipedia. © 2012 Elsevier Ltd. All rights reserved. 0 0
Getting to the source: Where does wikipedia get its information from? Getting to the source: Where does wikipedia get its information from? Heather Ford
Shilad Sen
Musicant D.R.
Nora Miller
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 We ask what kinds of sources Wikipedians value most and compare Wikipedia's stated policy on sources to what we observe in practice. We find that primary data sources developed by alternative publishers are both popular and persistent, despite policies that present such sources as inferior to scholarly secondary sources. We also find that Wikipedians make almost equal use of information produced by associations such as nonprofits as from scholarly publishers, with a significant portion coming from government information sources. Our findings suggest the rise of new influential sources of information on the Web but also reinforce the traditional geographic patterns of scholarly publication. This has a significant effect on the goal of Wikipedians to represent "the sum of all human knowledge." Categories and Subject Descriptors H.3.4 [Information Systems]: Systems and SoftwareInformation Networks; H.5.3 [Information Systems]: Group and Organization Interfacescomputer-supported collaborative work General Terms Human Factors, Measurement. Copyright 2010 ACM. 0 0
Hot Off the Wiki: Structures and Dynamics of Wikipedia's Coverage of Breaking News Events Hot Off the Wiki: Structures and Dynamics of Wikipedia's Coverage of Breaking News Events Brian Keegan
Darren Gergle
Noshir Contractor
American Behavioral Scientist English 2013 Wikipedia's coverage of breaking news and current events dominates editor contributions and reader attention in any given month. Collaborators on breaking news articles rapidly synthesize content to produce timely information in spite of steep coordination demands. Wikipedia's coverage of breaking news events thus presents a case to test theories about how open collaborations coordinate complex, time-sensitive, and knowledge-intensive work in the absence of central authority, stable membership, clear roles, or reliable information. Using the revision history from Wikipedia articles about over 3,000 breaking news events, we investigate the structure of interactions between editors and articles. Because breaking article collaborations unfold more rapidly and involve more editors than most Wikipedia articles, they potentially regenerate prior forms of organizing. We analyze whether the structures of breaking and nonbreaking article networks are (a) similarly structured over time, (b) exhibit features of organizational regeneration, and (c) have similar collaboration dynamics over time. Breaking and nonbreaking article exhibit similarities in their structural characteristics over the long run, and there is less evidence of organizational regeneration on breaking articles than nonbreaking articles. However, breaking articles emerge into well-connected collaborations more rapidly than nonbreaking articles, suggesting early contributors play a crucial role in supporting these high-tempo collaborations. 0 0
How do metrics of link analysis correlate to quality, relevance and popularity in Wikipedia? How do metrics of link analysis correlate to quality, relevance and popularity in Wikipedia? Hanada R.T.S.
Marco Cristo
Pimentel M.D.G.C.
WebMedia 2013 - Proceedings of the 19th Brazilian Symposium on Multimedia and the Web English 2013 Many links between Web pages can be viewed as indicative of the quality and importance of the pages they pointed to. Accordingly, several studies have proposed metrics based on links to infer web page content quality. However, as far as we know, the only work that has examined the correlation between such metrics and content quality consisted of a limited study that left many open questions. In spite of these metrics having been shown successful in the task of ranking pages which were provided as answers to queries submitted to search engines, it is not possible to determine the specific contribution of factors such as quality, popularity, and importance to the results. This difficulty is partially due to the fact that such information is hard to obtain for Web pages in general. Unlike ordinary Web pages, the quality, importance and popularity of Wikipedia articles are evaluated by human experts or might be easily estimated. Thus, it is feasible to verify the relation between link analysis metrics and such factors in Wikipedia articles, our goal in this work. To accomplish that, we implemented several link analysis algorithms and compared their resulting rankings with the ones created by human evaluators regarding factors such as quality, popularity and importance. We found that the metrics are more correlated to quality and popularity than to importance, and the correlation is moderate. 0 0
How much information is geospatially referenced? Networks and cognition How much information is geospatially referenced? Networks and cognition Hahmann S.
Burghardt D.
International Journal of Geographical Information Science English 2013 The aim of this article is to provide a basis in evidence for (or against) the much-quoted assertion that 80% of all information is geospatially referenced. For this purpose, two approaches are presented that are intended to capture the portion of geospatially referenced information in user-generated content: a network approach and a cognitive approach. In the network approach, the German Wikipedia is used as a research corpus. It is considered a network with the articles being nodes and the links being edges. The Network Degree of Geospatial Reference (NDGR) is introduced as an indicator to measure the network approach. We define NDGR as the shortest path between any Wikipedia article and the closest article within the network that is labeled with coordinates in its headline. An analysis of the German Wikipedia employing this approach shows that 78% of all articles have a coordinate themselves or are directly linked to at least one article that has geospatial coordinates. The cognitive approach is manifested by the categories of geospatial reference (CGR): direct, indirect, and non-geospatial reference. These are categories that may be distinguished and applied by humans. An empirical study including 380 participants was conducted. The results of both approaches are synthesized with the aim to (1) examine correlations between NDGR and the human conceptualization of geospatial reference and (2) to separate geospatial from non-geospatial information. From the results of this synthesis, it can be concluded that 56-59% of the articles within Wikipedia can be considered to be directly or indirectly geospatially referenced. The article thus describes a method to check the validity of the '80%-assertion' for information corpora that can be modeled using graphs (e.g., the World Wide Web, the Semantic Web, and Wikipedia). For the corpus investigated here (Wikipedia), the '80%-assertion' cannot be confirmed, but would need to be reformulated as a '60%-assertion'. 0 0
How much is said in a tweet? A multilingual, information-theoretic perspective How much is said in a tweet? A multilingual, information-theoretic perspective Neubig G.
Kevin Duh
AAAI Spring Symposium - Technical Report English 2013 This paper describes a multilingual study on how much information is contained in a single post of microblog text from Twitter in 26 different languages. In order to answer this question in a quantitative fashion, we take an information-theoretic approach, using entropy as our criterion for quantifying "how much is said" in a tweet. Our results find that, as expected, languages with larger character sets such as Chinese and Japanese contain more information per character than other languages. However, we also find that, somewhat surprisingly, information per character does not have a strong correlation with information per microblog post, as authors of microblog posts in languages with more information per character do not necessarily use all of the space allotted to them. Finally, we examine the relative importance of a number of factors that contribute to whether a language has more or less information content in each character or post, and also compare the information content of microblog text with more traditional text from Wikipedia. Copyright 0 0
Identifying multilingual wikipedia articles based on cross language similarity and activity Identifying multilingual wikipedia articles based on cross language similarity and activity Tran K.-N.
Christen P.
International Conference on Information and Knowledge Management, Proceedings English 2013 Wikipedia is an online free and open access encyclopedia available in many languages. Wikipedia articles across over 280 languages are written by millions of editors. However, the growth of articles and their content is slowing, especially within the largest Wikipedia language: English. The stabilization of articles presents opportunities for multilingual Wikipedia editors to apply their translation skills to add articles and content to smaller Wikipedia languages. In this poster, we propose similarity and activity measures of Wiki-pedia articles across two languages: English and German. These measures allow us to evaluate the distribution of articles based on their knowledge coverage and their activity across languages. We show the state of Wikipedia articles as of June 2012 and discuss how these measures allow us to develop recommendation and verification models for multilingual editors to enrich articles and content in Wikipedia languages with relatively smaller knowledge coverage. Copyright 2013 ACM. 0 0
Identifying, understanding and detecting recurring, harmful behavior patterns in collaborative wikipedia editing - Doctoral proposal Identifying, understanding and detecting recurring, harmful behavior patterns in collaborative wikipedia editing - Doctoral proposal Flock F.
Elena Simperl
Rettinger A.
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 In this doctoral proposal, we describe an approach to identify recurring, collective behavioral mechanisms in the collaborative interactions of Wikipedia editors that have the potential to undermine the ideals of quality, neutrality and completeness of article content. We outline how we plan to parametrize these patterns in order to understand their emergence and evolution and measure their effective impact on content production in Wikipedia. On top of these results we intend to build end-user tools to increase the transparency of the evolution of articles and equip editors with more elaborated quality monitors. We also sketch out our evaluation plans and report on already accomplished tasks. 0 0
Impact of wikipedia on market information environment: Evidence on management disclosure and investor reaction Impact of wikipedia on market information environment: Evidence on management disclosure and investor reaction Xu S.X.
Zhang X.M.
MIS Quarterly: Management Information Systems English 2013 In this paper, we seek to determine whether a typical social media platform, Wikipedia, improves the information environment for investors in the financial market. Our theoretical lens leads us to expect that information aggregation about public companies on Wikipedia may influence how management's voluntary information disclosure reacts to market uncertainty with respect to investors' information about these companies. Our empirical analysis is based on a unique data set collected from financial records, management disclosure records, news article coverage, and a Wikipedia modification history of public companies. On the supply side of information, we find that information aggregation on Wikipedia can moderate the timing of managers' voluntary disclosure of companies' earnings disappointments, or bad news. On the demand side of information, we find that Wikipedia's information aggregation moderates investors' negative reaction to bad news. Taken together, these findings support the view that Wikipedia improves the information environment in the financial market and underscore the value of information aggregation through the use of information technology. 0 0
Improved concept-based query expansion using Wikipedia Improved concept-based query expansion using Wikipedia Yuvarani M.
Iyengar N.Ch.S.N.
Kannan A.
International Journal of Communication Networks and Distributed Systems English 2013 The query formulation has always been a challenge for the users. In this paper, we propose a novel interactive query expansion methodology that identifies and presents the potential directions (generalised concepts) for the given query enabling the user to explore the interested topic further. The methodology proposed is concept-based direction (CoD) finder which relies on the external knowledge repository for finding the directions. Wikipedia, the most important non-profit crowdsourcing project, is considered as the external knowledge repository for CoD finder methodology. CoD finder identifies the concepts for the given query and derives the generalised direction for each of the concepts, based on the content of the Wikipedia article and the categories it belongs to. The CoD finder methodology has been evaluated in the crowdsourcing marketplace - Amazon's Mechanical Turk for measuring the quality of the identified potential directions. The evaluation result shows that the potential directions identified by the CoD finder methodology produces better precision and recall for the given queries. Copyright 0 0
Improved text annotation with wikipedia entities Improved text annotation with wikipedia entities Makris C.
Plegas Y.
Theodoridis E.
Proceedings of the ACM Symposium on Applied Computing English 2013 Text annotation is the procedure of initially identifying, in a segment of text, a set of dominant in meaning words and later on attaching to them extra information (usually drawn from a concept ontology, implemented as a catalog) that expresses their conceptual content in the current context. Attaching additional semantic information and structure helps to represent, in a machine interpretable way, the topic of the text and is a fundamental preprocessing step to many Information Retrieval tasks like indexing, clustering, classification, text summarization and cross-referencing content on web pages, posts, tweets etc. In this paper, we deal with automatic annotation of text documents with entities of Wikipedia, the largest online knowledge base; a process that is commonly known as Wikification. Moving similarly to previous approaches the cross-reference of words in the text to Wikipedia articles is based on local compatibility between the text around the term and textual information embedded in the article. The main contribution of this paper is a set of disambiguation techniques that enhance previously published approaches by employing both the WordNet lexical database and the Wikipedia article's PageRank scores in the disambiguation process. The experimental evaluation performed depicts that the exploitation of these additional semantic information sources leads to more accurate Text Annotation. Copyright 2013 ACM. 0 0
Improving large-scale search engines with semantic annotations Improving large-scale search engines with semantic annotations Fuentes-Lorenzo D.
Fernandez N.
Fisteus J.A.
Sanchez L.
Expert Systems with Applications English 2013 Traditional search engines have become the most useful tools to search the World Wide Web. Even though they are good for certain search tasks, they may be less effective for others, such as satisfying ambiguous or synonym queries. In this paper, we propose an algorithm that, with the help of Wikipedia and collaborative semantic annotations, improves the quality of web search engines in the ranking of returned results. Our work is supported by (1) the logs generated after query searching, (2) semantic annotations of queries and (3) semantic annotations of web pages. The algorithm makes use of this information to elaborate an appropriate ranking. To validate our approach we have implemented a system that can apply the algorithm to a particular search engine. Evaluation results show that the number of relevant web resources obtained after executing a query with the algorithm is higher than the one obtained without it. © 2012 Elsevier Ltd. All rights reserved. 0 0
Improving semi-supervised text classification by using wikipedia knowledge Improving semi-supervised text classification by using wikipedia knowledge Zhang Z.
Hong Lin
Li P.
Haofen Wang
Lu D.
Lecture Notes in Computer Science English 2013 Semi-supervised text classification uses both labeled and unlabeled data to construct classifiers. The key issue is how to utilize the unlabeled data. Clustering based classification method outperforms other semi-supervised text classification algorithms. However, its achievements are still limited because the vector space model representation largely ignores the semantic relationships between words. In this paper, we propose a new approach to address this problem by using Wikipedia knowledge. We enrich document representation with Wikipedia semantic features (concepts and categories), propose a new similarity measure based on the semantic relevance between Wikipedia features, and apply this similarity measure to clustering based classification. Experiment results on several corpora show that our proposed method can effectively improve semi-supervised text classification performance. 0 0
Improving text categorization with semantic knowledge in wikipedia Improving text categorization with semantic knowledge in wikipedia Xiaolong Wang
Jia Y.
Chen K.
Fan H.
Zhou B.
IEICE Transactions on Information and Systems English 2013 Text categorization, especially short text categorization, is a difficult and challenging task since the text data is sparse and multidimen-sional. In traditional text classification methods, document texts are repre-sented with ''Bag of Words (BOW)'' text representation schema, which is based on word co-occurrence and has many limitations. In this paper, we mapped document texts to Wikipedia concepts and used the Wikipedia-concept-based document representation method to take the place of tradi-tional BOW model for text classification. In order to overcome the weak-ness of ignoring the semantic relationships among terms in document rep-resentation model and utilize rich semantic knowledge in Wikipedia, we constructed a semantic matrix to enrich Wikipedia-concept-based docu-ment representation. Experimental evaluation on five real datasets of long and short text shows that our approach outperforms the traditional BOW method. 0 0
Improving the transcription of academic lectures for information retrieval Improving the transcription of academic lectures for information retrieval Mbogho A.
Marquard S.
Proceedings - 2013 12th International Conference on Machine Learning and Applications, ICMLA 2013 English 2013 Recording university lectures through lecture capture systems is increasingly common, generating large amounts of audio and video data. Transcribing recordings greatly enhances their usefulness by making them easy to search. However, the number of recordings accumulates rapidly, rendering manual transcription impractical. Automatic transcription, on the other hand, suffers from low levels of accuracy, partly due to the special language of academic disciplines, which standard language models do not cover. This paper looks into the use of Wikipedia to dynamically adapt language models for scholarly speech. We propose Ranked Word Correct Rate as a new metric better aligned with the goals of improving transcript search ability and specialist word recognition. The study shows that, while overall transcription accuracy may remain low, targeted language modelling can substantially improve search ability, an important goal in its own right. 0 0
Improving web search results with explanation-aware snippets: An experimental study Improving web search results with explanation-aware snippets: An experimental study Wira-Alam A.
Zloch M.
WEBIST 2013 - Proceedings of the 9th International Conference on Web Information Systems and Technologies English 2013 In this paper, we focus on a typical task on a web search, in which users want to discover the coherency between two concepts on the Web. In our point of view, this task can be seen as a retrieval process: starting with some source information, the goal is to find target information by following hyperlinks. Given two concepts, e.g. chemistry and gunpowder, are search engines able to find the coherency and explain it? In this paper, we introduce a novel way of linking two concepts by following paths of hyperlinks and collecting short text snippets. We implemented a proof-of-concept prototype, which extracts paths and snippets from Wikipedia articles. Our goal is to provide the user with an overview about the coherency, enriching the connection with a short but meaningful description. In our experimental study, we compare the results of our approach with the capability of web search engines. The results show that 72% of the participants find ours better than these of web search engines. Copyright 0 0
Indicoder: An extensible system for online annotation Indicoder: An extensible system for online annotation Gilbert M.
Morgan J.T.
Mark Zachry
McDonald D.
English 2013 Online annotations provide an effective way for distributed individuals to better understand and categorize online content, both from the perspective of distilling information presented into more easily interpretable forms and by supporting content analysis to tag individual statements with their intended meaning. This poster presents Indicoder, an application to support in-place content analysis, allowing users to both annotate online corpora and providing a means of tracking those annotations over time so that living documents such as Wikipedia articles and online sources can be analyzed in their authentic contexts. Copyright © 2012 by the Association for Computing Machinery, Inc. (ACM). 0 0
InfoLand: Information lay-of-land for session search InfoLand: Information lay-of-land for session search Jing Luo
Guan D.
Haitao Yang
SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2013 Search result clustering (SRC) is a post-retrieval process that hierarchically organizes search results. The hierarchical structure offers overview for the search results and displays an "information lay-of-land" that intents to guide the users throughout a search session. However, SRC hierarchies are sensitive to query changes, which are common among queries in the same session. This instability may leave users seemly random overviews throughout the session. We present a new tool called InfoLand that integrates external knowledge from Wikipedia when building SRC hierarchies and increase their stability. Evaluation on TREC 2010-2011 Session Tracks shows that InfoLand produces more stable results organization than a commercial search engine. 0 0
Interactive information search in text data collections Interactive information search in text data collections Deptula M.
Szymanski J.
Krawczyk H.
Studies in Computational Intelligence English 2013 This article presents a new idea for retrieving in text repositories, as well as it describes general infrastructure of a system created to implement and test those ideas. The implemented system differs from today's standard search engine by introducing process of interactive search with users and data clustering. We present the basic algorithms behind our system and measures we used for results evaluation. The achieved results indicates the proposed method can be useful for improvement of classical approaches based on keywords. 0 0
Interactive query expansion using concept-based directions finder based on Wikipedia Interactive query expansion using concept-based directions finder based on Wikipedia Meiyappan Y.
Iyengar S.
International Arab Journal of Information Technology English 2013 Despite the advances in information retrieval the search engines still result in imprecise or poor results, mainly due to the quality of the query being submitted. The query formulation to express their information need has always been challenging for the users. In this paper, we have proposed an interactive query expansion methodology using Concept-Based Directions Finder (CBDF). The approach determines the directions in which the search can be continued by the user using Explicit Semantic Analysis (ESA) for a given query. The CBDF identifies the relevant terms with a corresponding label for each of the directions found, based on the content and link structure of Wikipedia. The relevant terms identified along with its label are suggested to the user for query expansion through the new visual interface proposed. The visual interface named as terms mapper, accepts the query, and displays the potential directions and a group of relevant terms along with the label for the direction chosen by the user. We evaluated the results of the proposed approach and the visual interfacefor the identified queries. The experimental result shows that the approach produces a good Mean Average Precision (MAP) for the queries chosen. 0 0
Interest classification of twitter users using wikipedia Interest classification of twitter users using wikipedia Lim K.H.
Anwitaman Datta
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 We present a framework for (automatically) classifying the relative interests of Twitter users using information from Wikipedia. Our proposed framework first usesWikipedia to automatically classify a user's celebrity followings into various interest categories, followed by determining the relative interests of the user with a weighting compared to his/her other interests. Our preliminary evaluation on Twitter shows that this framework is able to correctly classify users' interests and that these users frequently converse about topics that reflect both their (detected) interest and a related real-life event. Categories and Subject Descriptors: J.4 [Computer Applications]: Social and behavioral sciences General Terms: Theory. Copyright 2010 ACM. 0 0
Invasion biology and the success of social collaboration networks, with application to wikipedia Invasion biology and the success of social collaboration networks, with application to wikipedia Mangel M.
Satterthwaite W.H.
Peter Pirolli
Bongwon Suh
YanChun Zhang
Israel Journal of Ecology and Evolution English 2013 We adapt methods from the stochastic theory of invasions - for which a key question is whether a propagule will grow to an established population or fail - To show how monitoring early participation in a social collaboration network allows prediction of success. Social collaboration networks have become ubiquitous and can now be found in widely diverse situations. However, there are currently no methods to predict whether a social collaboration network will succeed or not, where success is defined as growing to a specified number of active participants before falling to zero active participants. We illustrate a suitable methodology with Wikipedia. In general, wikis are web-based software that allows collaborative efforts in which all viewers of a page can edit its contents online, thus encouraging cooperative efforts on text and hypertext. The English language Wikipedia is one of the most spectacular successes, but not all wikis succeed and there have been some major failures. Using these new methods, we derive detailed predictions for the English language Wikipedia and in summary for more than 250 other language Wikipedias. We thus show how ideas from population biology can inform aspects of technology in new and insightful ways. 0 0
Investigating the determinants of contribution value in Wikipedia Investigating the determinants of contribution value in Wikipedia Zhao S.J.
Zhang K.Z.K.
Christian Wagner
Hejie Chen
International Journal of Information Management English 2013 The recent prevalence of wiki applications has demonstrated that wikis have high potential in facilitating knowledge creation, sharing, integration, and utilization. As wikis are increasingly adopted in contexts like business, education, research, government, and the public, how to improve user contribution becomes an important concern for researchers and practitioners. In this research, we focus on the quality aspect of user contribution: contribution value. Building upon the critical mass theory and research on editing activities in wikis, this study investigates whether user interests and resources can increase contribution value for different types of users. We develop our research model and empirically test it using survey method and content analysis method in Wikipedia. The results demonstrate that (1) for users who emphasize substantive edits, depth of interests and depth of resources play more influential roles in affecting contribution value; and (2) for users who focus on non-substantive edits, breadth of interests and breadth of resources are more important in increasing contribution value. The findings suggest that contribution value develops in different patterns for two types of users. Implications for both theory and practice are discussed. 0 0
Involve the users to increase their acceptance - An experience report Involve the users to increase their acceptance - An experience report Muhlbauer A.
Nissen K.
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 Wikipedia is a top-ten web site providing a community- driven free encyclopedia. Its success depends on the support of its volunteer contributors. And Wikipedia is a research object in several academic elds. Wikimedia Deutschland, the German Wikimedia Chapter, is a partner in the EU- funded international research project \RENDER - Reect- ing knowledge diversity". With this participation we aim to support Wikipedia authors in editing, and to understand- ing the status of articles. This experience report focuses on our interaction with in particular the German-speaking Wikipedia community - less on the project and its results. We reached out to members of the Wikipedia community via several ways. In addition to the online channels, the live meetings are of particular importance to build up an interested and active community. During our project, we learned that it is very important to involve the users at an early stage. That helps to increase the acceptance and the willingness to support the project. If Wikipedians can see a benet of research results and developments for their daily life in Wikipedia or the advancement of the whole project, they will be more willing to give innovations a try. Categories and Subject Descriptors K.4.0 [Computers and Society]: General General Terms Human Factors. Copyright 2010 ACM. 0 0
Is Wikipedia inefficient? Modelling effort and participation in Wikipedia Is Wikipedia inefficient? Modelling effort and participation in Wikipedia Kevin Crowston
Nicolas Jullien
Felipe Ortega
Proceedings of the Annual Hawaii International Conference on System Sciences English 2013 Concerns have been raised about the decreased ability of Wikipedia to recruit editors and in to harness the effort of contributors to create new articles and improve existing articles. But, as [1], [2] explained, in collective projects, in the initial stage of the project, people are few and efforts costly; in the diffusion phase, the number of participants grows as their efforts are rewarding; and in the mature phase, some inefficiency may appear as the number of contributors is more than the work requires. In this paper, thanks to original data we extract from 36 of the main language projects, we compare the efficiency of Wikipedia projects in different languages and at different states of development to examine this effect. 0 1
Issues for linking geographical open data of GeoNames and Wikipedia Issues for linking geographical open data of GeoNames and Wikipedia Yoshioka M.
Kando N.
Lecture Notes in Computer Science English 2013 It is now possible to use various geographical open data sources such as GeoNames and Wikipedia to construct geographic information systems. In addition, these open data sources are integrated by the concept of Linked Open Data. There have been several attempts to identify links between existing data, but few studies have focused on the quality of such links. In this paper, we introduce an automatic link discovery method for identifying the correspondences between GeoNames entries and Wikipedia pages, based on Wikipedia category information. This method finds not only appropriate links but also inconsistencies between two databases. Based on this integration results, we discuss the type of inconsistencies for making consistent Linked Open Data. 0 0
Keeping eyes on the prize: Officially sanctioned rule breaking in mass collaboration systems Keeping eyes on the prize: Officially sanctioned rule breaking in mass collaboration systems Elisabeth Joyce
Jacqueline Pike
Brian Butler
English 2013 Mass collaboration systems are often characterized as unstructured organizations lacking rule and order. However, examination of Wikipedia reveals that it contains a complex policy and rule structure that supports the organization. Bureaucratic organizations adopt workarounds to adjust rules more accurately to the context of use. Rather than resorting to these potentially dangerous exceptions, Wikipedia has created officially sanctioned rule breaking. The use and impact of the official rule breaking policy within Wikipedia is examined to test its impact on the outcomes of requests to delete articles in from the encyclopedia. The results demonstrate that officially sanctioned rule breaking and the Ignore all rules (IAR) policy are meaningful influences on deliberation outcomes, and rather than wreaking havoc, the IAR policy in Wikipedia has been adopted as a positive, functional governance mechanism. Copyright 2013 ACM. 0 0
Keeping wiki content current via news sources Keeping wiki content current via news sources Adams R.
Kuntz A.
Marks M.
Martin W.
Musicant D.R.
International Conference on Intelligent User Interfaces, Proceedings IUI English 2013 Online resources known as wikis are commonly used for collection and distribution of information. We present a software implementation that assists wiki contributors with the task of keeping a wiki current. Our demonstration, built using English Wikipedia, enables wiki contributors to subscribe to sources of news, based on which it makes intelligent recommendations for pages within Wikipedia where the new content should be added. This tool is also potentially useful for helping new Wikipedia editors find material to contribute. 0 0
Knowledge base population and visualization using an ontology based on semantic roles Knowledge base population and visualization using an ontology based on semantic roles Siahbani M.
Vadlapudi R.
Whitney M.
Sarkar A.
AKBC 2013 - Proceedings of the 2013 Workshop on Automated Knowledge Base Construction, Co-located with CIKM 2013 English 2013 This paper extracts facts using "micro-reading" of text in contrast to approaches that extract common-sense knowledge using "macro-reading" methods. Our goal is to extract detailed facts about events from natural language using a predicate-centered view of events (who did what to whom, when and how). We exploit semantic role labels in order to create a novel predicate-centric ontology for entities in our knowledge base. This allows users to find uncommon facts easily. To this end, we tightly couple our knowledge base and ontology to an information visualization system that can be used to explore and navigate events extracted from a large natural language text collection. We use our methodology to create a web-based visual browser of history events in Wikipedia. 0 0
Knowledge-based Named Entity recognition in Polish Knowledge-based Named Entity recognition in Polish Pohl A. 2013 Federated Conference on Computer Science and Information Systems, FedCSIS 2013 English 2013 This document describes an algorithm aimed at recognizing Named Entities in Polish text, which is powered by two knowledge sources: the Polish Wikipedia and the Cyc ontology. Besides providing the rough types for the recognized entities, the algorithm links them to the Wikipedia pages and assigns precise semantic types taken from Cyc. The algorithm is verified against manually identified Named Entities in the one-million sub-corpus of the National Corpus of Polish. 0 0
Labeling blog posts with wikipedia entries through LDA-based topic modeling of wikipedia Labeling blog posts with wikipedia entries through LDA-based topic modeling of wikipedia Makita K.
Suzuki H.
Koike D.
Takehito Utsuro
Kawada Y.
Tomohiro Fukuhara
Journal of Internet Technology English 2013 Given a search query, most existing search engines simply return a ranked list of search results. However, it is often the case that those search result documents consist of a mixture of documents that are closely related to various contents. In order to address the issue of quickly overviewing the distribution of contents, this paper proposes a framework of labeling blog posts with Wikipedia entries through LDA (latent Dirichlet allocation) based topic modeling of Wikipedia. One of the most important advantages of this LDA-based document model is that the collected Wikipedia entries and their LDA parameters heavily depend on the distribution of keywords across all the search result of blog posts. This tendency actually contributes to quickly overviewing the search result of blog posts through the LDA-based topic distribution. We show that the LDA-based document retrieval scheme outperforms our previous approach. Finally, we compare the proposed approach to the standard LDA-based topic modeling without Wikipedia knowledge source. Both LDA-based topic modeling results have quite different nature and contribute to quickly overviewing the search result of blog posts in a quite complementary fashion. 0 0
Learning multilingual named entity recognition from Wikipedia Learning multilingual named entity recognition from Wikipedia Joel Nothman
Nicky Ringland
Will Radford
Tara Murphy
Curran J.R.
Artificial Intelligence English 2013 We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify and classify names of people, locations and organisations in text. This dependence on expensive annotation is the knowledge bottleneck our work overcomes. We first classify each Wikipedia article into named entity (ne) types, training and evaluating on 7200 manually-labelled Wikipedia articles across nine languages. Our cross-lingual approach achieves up to 95% accuracy. We transform the links between articles into ne annotations by projecting the target articles classifications onto the anchor text. This approach yields reasonable annotations, but does not immediately compete with existing gold-standard data. By inferring additional links and heuristically tweaking the Wikipedia corpora, we better align our automatic annotations to gold standards. We annotate millions of words in nine languages, evaluating English, German, Spanish, Dutch and Russian Wikipedia-trained models against conll shared task data and other gold-standard corpora. Our approach outperforms other approaches to automatic ne annotation (Richman and Schone, 2008 [61], Mika et al., 2008 [46]) competes with gold-standard training when tested on an evaluation corpus from a different source; and performs 10% better than newswire-trained models on manually-annotated Wikipedia text. © 2012 Elsevier B.V. All rights reserved. 0 0
Learning through massively co-authored biographies: Making sense of steve jobs on wikipedia through delegated voice Learning through massively co-authored biographies: Making sense of steve jobs on wikipedia through delegated voice Rughinis C.
Matei S.
Proceedings - 19th International Conference on Control Systems and Computer Science, CSCS 2013 English 2013 This paper discusses opportunities for learning about biographies through Wikipedia, The Free Encyclopaedia. We examine argumentation and interpretation practices in Steve Jobs's entry and its associated Talk pages, focusing on editors' debates on whether Jobs was an 'inventor'. We highlight argumentation from delegated voice as a core element of Wikipedian knowledge building; contributors' variable skills in engaging this NPOV mandated requirement account for their success or failure in promoting changes in page content and structure. Editors' concerns about topic relevance and page structure are particularly vulnerable to counter-argumentation from delegated voice. 0 0
Learning to rank concept annotation for text Learning to rank concept annotation for text Tu X.
He T.
Li F.
Wang J.
Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis Chinese 2013 This paper proposed an automatic text annotation method (CRM, concept ranking model) based on learning to ranking model. Firstly the authors built a training set of concept annotation manualy, and then used the Ranking SVM algorithm to generate concept ranking model, finally the concept ranking model was used to generate concept annotation for any texts. Experiments show that proposed method has a significant improvement in various indicators compared to traditional annotation methods, and concept annotation results is closer to human annotation. 0 0
Leveraging encyclopedic knowledge for transparent and serendipitous user profiles Leveraging encyclopedic knowledge for transparent and serendipitous user profiles Narducci F.
Musto C.
Giovanni Semeraro
Pasquale Lops
De Gemmis M.
Lecture Notes in Computer Science English 2013 The main contribution of this work is the comparison of different techniques for representing user preferences extracted by analyzing data gathered from social networks, with the aim of constructing more transparent (human-readable) and serendipitous user profiles. We compared two different user models representations: one based on keywords and one exploiting encyclopedic knowledge extracted from Wikipedia. A preliminary evaluation involving 51 Facebook and Twitter users has shown that the use of an encyclopedic-based representation better reflects user preferences, and helps to introduce new interesting topics. 0 0
Lifecycle-based evolution of features in collaborative open production communities: The case of wikipedia Lifecycle-based evolution of features in collaborative open production communities: The case of wikipedia Ziaie P.
Imamovic M.
ECIS 2013 - Proceedings of the 21st European Conference on Information Systems English 2013 In the last decade, collaborative open production communities have provided an effective platform for geographically dispersed users to collaborate and generate content in a well-structured and consistent form. Wikipedia is a prominent example in this area. What is of great importance in production communities is the prioritization and evolution of features with regards to the community lifecycle. Users are the cornerstone of such communities and their needs and attitudes constantly change as communities grow. The increasing amount and versatility of content and users requires modifications in areas ranging from user roles and access levels to content quality standards and community policies and goals. In this paper, we draw on two pertinent theories in terms of the lifecycle of online communities and open collaborative communities in particular by focusing on the case of Wikipedia. We conceptualize three general stages (Rising, Organizing, and Stabilizing) within the lifecycle of collaborative open production communities. The salient factors, features and focus of attention in each stage are provided and the chronology of features is visualized. These findings, if properly generalized, can help designers of other types of open production communities effectively allocate their resources and introduce new features based on the needs of both community and users. 0 0
Lo mejor de dos idiomas - Cross-lingual linkage of geotagged Wikipedia articles Lo mejor de dos idiomas - Cross-lingual linkage of geotagged Wikipedia articles Ahlers D. Lecture Notes in Computer Science English 2013 Different language versions of Wikipedia contain articles referencing the same place. However, an article in one language does not necessarily mean it is available in another language as well and linked to. This paper examines geotagged articles describing places in Honduras in both the Spanish and the English language versions. It demonstrates that a method based on simple features can reliably identify article pairs describing the same semantic place concept and evaluates it against the existing interlinks as well as a manual assessment. 0 0
Long Live Wikipedia?: Sustainable Volunteerism and the Future of Crowd-Sourced Knowledge Long Live Wikipedia?: Sustainable Volunteerism and the Future of Crowd-Sourced Knowledge Andrew Lih A Companion to New Media Dynamics English 2013 [No abstract available] 0 0
MDL-based models for transliteration generation MDL-based models for transliteration generation Nouri J.
Pivovarova L.
Yangarber R.
Lecture Notes in Computer Science English 2013 This paper presents models for automatic transliteration of proper names between languages that use different alphabets. The models are an extension of our work on automatic discovery of patterns of etymological sound change, based on the Minimum Description Length Principle. The models for pairwise alignment are extended with algorithms for prediction that produce transliterated names. We present results on 13 parallel corpora for 7 languages, including English, Russian, and Farsi, extracted from Wikipedia headlines. The transliteration corpora are released for public use. The models achieve up to 88% on word-level accuracy and up to 99% on symbol-level F-score. We discuss the results from several perspectives, and analyze how corpus size, the language pair, the type of names (persons, locations), and noise in the data affect the performance. 0 0
MJ no more: Using concurrent wikipedia edit spikes with social network plausibility checks for breaking news detection MJ no more: Using concurrent wikipedia edit spikes with social network plausibility checks for breaking news detection Steiner T.
Van Hooland S.
Summers E.
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 We have developed an application called Wikipedia Live Monitor that monitors article edits on different language versions of Wikipedia-as they happen in realtime. Wikipedia articles in different languages are highly interlinked. For example, the English article "en:2013 Russian meteor event" on the topic of the February 15 meteoroid that exploded over the region of Chelyabinsk Oblast, Russia, is interlinked with ", the Russian article on the same topic. As we monitor multiple language versions of Wikipedia in parallel, we can exploit this fact to detect concurrent edit spikes of Wikipedia articles covering the same topics, both in only one, and in different languages. We treat such concurrent edit spikes as signals for potential breaking news events, whose plausibility we then check with full-text cross-language searches on multiple social networks. Unlike the reverse approach of monitoring social networks first, and potentially checking plausibility on Wikipedia second, the approach proposed in this paper has the advantage of being less prone to falsepositive alerts, while being equally sensitive to true-positive events, however, at only a fraction of the processing cost. A live demo of our application is available online at the URL http://wikipedia-irc. herokuapp.com/, the source code is available under the terms of the Apache 2.0 license at https://github.com/tomayac/wikipedia-irc. 0 0
Making Peripheral Participation Legitimate: Reader Engagement Experiments in Wikipedia Making Peripheral Participation Legitimate: Reader Engagement Experiments in Wikipedia Aaron Halfaker
Oliver Keyes
Dario Taraborelli
Computer-Supported Cooperative Work English 2013 Open collaboration communities thrive when participation is plentiful. Recent research has shown that the English Wikipedia community has constructed a vast and accurate information resource primarily through the monumental effort of a relatively small number of active, volunteer editors. Beyond Wikipedia's active editor community is a substantially larger pool of potential participants: readers. In this paper we describe a set of field experiments using the Article Feedback Tool, a system designed to elicit lightweight contributions from Wikipedia's readers. Through the lens of social learning theory and comparisons to related work in open bug tracking software, we evaluate the costs and benefits of the expanded participation model and show both qualitatively and quantitatively that peripheral contributors add value to an open collaboration community as long as the cost of identifying low quality contributions remains low. 8 0
Making collective wisdom wiser Making collective wisdom wiser Milo T. Lecture Notes in Computer Science English 2013 Many popular sites, such as Wikipedia and Tripadvisor, rely on public participation to gather information - a process known as crowd data sourcing. While this kind of collective intelligence is extremely valuable, it is also fallible, and policing such sites for inaccuracies or missing material is a costly undertaking. In this talk we will overview the MoDaS project that investigates how database technology can be put to work to effectively gather information from the public, efficiently moderate the process, and identify questionable input with minimal human interaction [1-4, 7]. We will consider the logical, algorithmic, and methodological foundations for the management of large scale crowd-sourced data as well as the development of applications over such information. 0 0
Making peripheral participation legitimate: Reader engagement experiments in wikipedia Making peripheral participation legitimate: Reader engagement experiments in wikipedia Aaron Halfaker
Oliver Keyes
Dario Taraborelli
English 2013 Open collaboration communities thrive when participation is plentiful. Recent research has shown that the English Wikipedia community has constructed a vast and accurate information resource primarily through the monumental effort of a relatively small number of active, volunteer editors. Beyond Wikipedia's active editor community is a substantially larger pool of potential participants: readers. In this paper we describe a set of field experiments using the Article Feedback Tool, a system designed to elicit lightweight contributions fromWikipedia's readers. Through the lens of social learning theory and comparisons to related work in open bug tracking software, we evaluate the costs and benefits of the expanded participation model and show both qualitatively and quantitatively that peripheral contributors add value to an open collaboration community as long as the cost of identifying low quality contributions remains low. Copyright 2013 ACM. 0 0
Managing complexity: Strategies for group awareness and coordinated action in wikipedia Managing complexity: Strategies for group awareness and coordinated action in wikipedia Gilbert M.
Morgan J.T.
David W. McDonald
Mark Zachry
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 In online groups, increasing explicit coordination can increase group cohesion and member productivity. On Wikipedia, groups called WikiProjects employ a variety of explicit coordination mechanisms to motivate and structure member contribution, with the goal of creating and improving articles related to particular topics. However, while explicit coordination works well for coordinating article-level actions, coordinating group tasks and tracking progress towards group goals that involve tracking hundreds or thousands of articles over time requires different coordination strategies. To lower the coordination cost of monitoring and task-routing, WikiProjects centralize coordination activity on WikiProject pages - "micro-sites" that provide a centralized repository of project tools, tasks and targets, and discussion for explicit group coordination. These tools can facilitate shared awareness of member and non-member editing activity on articles that the project cares about. However, whether these tools are as effective at motivating members as explicit coordination, and whether they elicit the same kind of contributions, has not been studied. In this study, we examine one such tool, Hot Articles, and compare its effect on the editing behavior of WikiProject members with a more common, explicit coordination mechanism: making edit requests on the project talk page. Categories and Subject Descriptors H.5.3. Information Interfaces and Presentation (e.g., HCI): Group and organization interfaces. General Terms Human Factors; Design; Measurement. Copyright 2010 ACM. 0 0
Managing information disparity in multilingual document collections Managing information disparity in multilingual document collections Kevin Duh
Yeung C.-M.A.
Iwata T.
Masaaki Nagata
ACM Transactions on Speech and Language Processing English 2013 Information disparity is a major challenge with multilingual document collections. When documents are dynamically updated in a distributed fashion, information content among different language editions may gradually diverge. We propose a framework for assisting human editors to manage this information disparity, using tools from machine translation and machine learning. Given source and target documents in two different languages, our system automatically identifies information nuggets that are new with respect to the target and suggests positions to place their translations. We perform both real-world experiments and large-scale simulations on Wikipedia documents and conclude our system is effective in a variety of scenarios. 0 0
Manipulation among the arbiters of collective intelligence: How wikipedia administrators mold public opinion Manipulation among the arbiters of collective intelligence: How wikipedia administrators mold public opinion Sanmay Das
Lavoie A.
Malik Magdon-Ismail
International Conference on Information and Knowledge Management, Proceedings English 2013 Our reliance on networked, collectively built information is a vulnerability when the quality or reliability of this information is poor. Wikipedia, one such collectively built information source, is often our first stop for information on all kinds of topics; its quality has stood up to many tests, and it prides itself on having a "Neutral Point of View". Enforcement of neutrality is in the hands of comparatively few, powerful administrators. We find a surprisingly large number of editors who change their behavior and begin focusing more on a particular controversial topic once they are promoted to administrator status. The conscious and unconscious biases of these few, but powerful, administrators may be shaping the information on many of the most sensitive topics on Wikipedia; some may even be explicitly infiltrating the ranks of administrators in order to promote their own points of view. Neither prior history nor vote counts during an administrator's election can identify those editors most likely to change their behavior in this suspicious manner. We find that an alternative measure, which gives more weight to influential voters, can successfully reject these suspicious candidates. This has important implications for how we harness collective intelligence: even if wisdom exists in a collective opinion (like a vote), that signal can be lost unless we carefully distinguish the true expert voter from the noisy or manipulative voter. Copyright is held by the owner/author(s). 0 0
Matching named entities with the aid of Wikipedia Matching named entities with the aid of Wikipedia Abdullah Bawakid
Mourad Oussalah
Afzal N.
Shim S.-O.
Ahsan S.
Life Science Journal English 2013 In this paper we propose a novel framework using features extracted from Wikipedia for the task of Matching Named Entities. We present how the employed features are extracted from Wikipedia and how its structure is utilized. We describe how a term-concepts table constructed from Wikipedia and the redirect links is integrated in the framework. In addition, the internal links within Wikipedia along with the categories structure are also used to compute the relatedness between concepts. We evaluate the built framework and report its performance in the applications of Word Sense Disambiguation and Named Entities Matching. The system performance is compared against other learning-based state-of-the-art systems and its reported results are found to be competitive. We also present a method in this paper for Named Entities Matching that is context-independent and compare its results with other systems. 0 0
Measuring semantic relatedness using wikipedia signed network Measuring semantic relatedness using wikipedia signed network Yang W.-T.
Kao H.-Y.
Journal of Information Science and Engineering English 2013 Identifying the semantic relatedness of two words is an important task for the information retrieval, natural language processing, and text mining. However, due to the diversity of meaning for a word, the semantic relatedness of two words is still hard to precisely evaluate under the limited corpora. Nowadays, Wikipedia is now a huge and wiki-based encyclopedia on the internet that has become a valuable resource for research work. Wikipedia articles, written by a live collaboration of user editors, contain a high volume of reference links, URL identification for concepts and a complete revision history. Moreover, each Wikipedia article represents an individual concept that simultaneously contains other concepts that are hyperlinks of other articles embedded in its content. Through this, we believe that the semantic relatedness between two words can be found through the semantic relatedness between two Wikipedia articles. Therefore, we propose an Editor-Contribution-based Rank (ECR) algorithm for ranking the concepts in the article's content through all revisions and take the ranked concepts as a vector representing the article. We classify four types of relationship in which the behavior of addition and deletion maps appropriate and inappropriate concepts. ECR also extend the concept semantics by the editor-concept network. ECR ranks those concepts depending on the mutual signed-reinforcement relationship between the concepts and the editors. The results reveal that our method leads to prominent performance improvement and increases the correlation coefficient by a factor ranging from 4% to 23% over previous methods that calculate the relatedness between two articles. 0 0
Measuring the Compositionality of Arabic Multiword Expressions Measuring the Compositionality of Arabic Multiword Expressions Saif A.
Ab Aziz M.J.
Omar N.
Communications in Computer and Information Science English 2013 This paper presents a method for measuring the compositionality score of multiword expressions (MWEs). Based on Wikipedia (WP) as a lexicon resource, the multiword expressions are identified using the title of Wikipedia articles that are made up of more than one word without further process. Through the semantic representation, this method exploits the hierarchical taxonomy in Wikipedia to represent the concept (single word or multiword) as a feature vector containing the WP articles that belong to concept of categories and sub-categories. The literality and the multiplicative function composition scores are used for measuring the compositionality score of an MWE utilizing the semantic similarity. The proposed method is evaluated by comparing the compositionality score against human judgments (dataset) containing 100 Arabic noun-noun compounds. 0 0
Method and tool support for classifying software languages with Wikipedia Method and tool support for classifying software languages with Wikipedia Lammel R.
Mosen D.
Varanovich A.
Lecture Notes in Computer Science English 2013 Wikipedia provides useful input for efforts on mining taxonomies or ontologies in specific domains. In particular, Wikipedia's categories serve classification. In this paper, we describe a method and a corresponding tool, WikiTax, for exploring Wikipedia's category graph with the objective of supporting the development of a classification of software languages. The category graph is extracted level by level. The extracted graph is visualized in a tree-like manner. Category attributes (i.e., metrics) such as depth are visualized. Irrelevant edges and nodes may be excluded. These exclusions are documented while using a manageable and well-defined set of 'exclusion types' as comments. 0 0
Mining user-generated path traversal patterns in an information network Mining user-generated path traversal patterns in an information network Takes F.W.
Kosters W.A.
Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013 English 2013 This paper studies patterns occurring in usergenerated clickpaths within the online encyclopedia Wikipedia. The clickpath data originates from over seven million goal-oriented clicks gathered from the Wiki Game, an online game in which the goal is to find a path between two given random Wikipedia articles. First we propose to use node-based path traversal patterns to derive a new measure of node centrality, arguing that a node is central if it proves useful in navigating through the network. A comparison with centrality measures from literature is provided, showing that users generally "know" only a relatively small portion of the network, which they employ frequently in finding their goal, and that this set of nodes differs significantly from the set of central nodes according to various centrality measures. Next, using the notion of subgraph centrality, we show that users are able to identify a small yet efficient portion of the graph that is useful for successfully completing their navigation goals. 0 0
Modeling and simulation on collective intelligence in future internet-A study of wikipedia Modeling and simulation on collective intelligence in future internet-A study of wikipedia Du S.
Qi J.
Information Technology Journal English 2013 Under the background of Web 2.0, network's socialization generates collective intelligence which can enrich human beings wisdom. However, what is the main factor that influences the performance of this behavior is still in research. In this study, the effect of number of Internet users that is represented by quantity, quality and variety of User-generated Content (UGC) is brought forward. Regarding Wikipedia as a study case, this study uses Agent-based modeling methodology and real data of Wikipedia for about 10 years to establish and simulate the model. The results verify that the size of group is indeed a necessary condition to generate collective intelligence. When the number of participants in Wikipedia reaches about 400000, the quantity of UGC increases exponentially, the quality of UGC reaches a satisfactory level and the variety of UGC can be guaranteed. This insight gives significance to show when mass collaboration will lead to collective intelligence which is an innovation than before. 0 0
Models of human navigation in information networks based on decentralized search Models of human navigation in information networks based on decentralized search Denis Helic
Strohmaier M.
Michael Granitzer
Scherer R.
HT 2013 - Proceedings of the 24th ACM Conference on Hypertext and Social Media English 2013 Models of human navigation play an important role for understanding and facilitating user behavior in hypertext systems. In this paper, we conduct a series of principled experiments with decentralized search - an established model of human navigation in social networks - and study its applicability to information networks. We apply several variations of decentralized search to model human navigation in information networks and we evaluate the outcome in a series of experiments. In these experiments, we study the validity of decentralized search by comparing it with human navigational paths from an actual information network - Wikipedia. We find that (i) navigation in social networks appears to differ from human navigation in information networks in interesting ways and (ii) in order to apply decentralized search to information networks, stochastic adaptations are required. Our work illuminates a way towards using decentralized search as a valid model for human navigation in information networks in future work. Our results are relevant for scientists who are interested in modeling human behavior in information networks and for engineers who are interested in using models and simulations of human behavior to improve on structural or user interface aspects of hypertextual systems. Copyright 2013 ACM. 0 0
Monitoring network structure and content quality of signal processing articles on wikipedia Monitoring network structure and content quality of signal processing articles on wikipedia Lee T.C.
Unnikrishnan J.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings English 2013 Wikipedia has become a widely-used resource on signal processing. However, the freelance-editing model of Wikipedia makes it challenging to maintain a high content quality. We develop techniques to monitor the network structure and content quality of Signal Processing (SP) articles on Wikipedia. Using metrics to quantify the importance and quality of articles, we generate a list of SP articles on Wikipedia arranged in the order of their need for improvement. The tools we use include the HITS and PageRank algorithms for network structure, crowdsourcing for quantifying article importance and known heuristics for article quality. 0 0
Morbid Inferences: Whitman, Wikipedia, and the Debate Over the Poet's Sexuality Morbid Inferences: Whitman, Wikipedia, and the Debate Over the Poet's Sexuality Jason Stacy
Cory Blad
Rob Velella
Polymath: An Interdisciplinary Arts and Sciences Journal 2013 The ascendency of identity as an effective political mobilization strategy has opened significant opportunities for group definition (or redefinition) of previously accepted information and knowledge claims. The emergence of “identity politics” in this post industrial era is but one of several reflective conditions, but certainly one that is imminently helpful in understanding the “reopening” of debate on Whitman’s sexual orientation. In order to gain control over definitions it becomes necessary to rhetorically politicize the authority of scholars and the primacy of existing professional knowledge. The struggle over historicity, facilitated by the expansion of telecommunications technologies and online collaboration, has created a substantial opportunity to challenge seemingly “settled” knowledge and expand debate beyond academic boundaries while either appealing to academic authority, or dismissing claims of academic objectivity whenever rhetorically convenient. The decline of traditional authority structures and the opening of discursive opportunities creates a field in which academic expertise becomes increasingly contested for politico-personal ends, especially on a quasi-authoritative, semi-anonymous, open- access forum like Wikipedia. Whitman’s “multitudes,” coupled with his notoriety and claims to be the nation’s poet, make him a rich battleground over American sexual politics. 0 0
Navigating the topical structure of academic search results via the wikipedia category network Navigating the topical structure of academic search results via the wikipedia category network Mirylenka D.
Passerini A.
International Conference on Information and Knowledge Management, Proceedings English 2013 Searching for scientific publications on the Web is a tedious task, especially when exploring an unfamiliar domain. Typical scholarly search engines produce lengthy unstructured result lists that are difficult to comprehend, interpret and browse. We propose a novel method of organizing the search results into concise and informative topic hierarchies. The method consists of two steps: extracting interrelated topics from the result set, and summarizing the topic graph. In the first step we map the search results to articles and categories of Wikipedia, constructing a graph of relevant topics with hierarchical relations. In the second step we sequentially build nested summaries of the produced topic graph using a structured output prediction approach. Trained on a small number of examples, our method learns to construct informative summaries for unseen topic graphs, and outperforms unsupervised state-of-the-art Wikipedia-based clustering. Copyright is held by the owner/author(s). 0 0
Network analysis of user generated content quality in Wikipedia Network analysis of user generated content quality in Wikipedia Myshkin Ingawale
Amitava Dutta
Rahul Roy
Priya Seetharaman
Online Information Review English 2013 Purpose - Social media platforms allow near-unfettered creation and exchange of user generated content (UGC). Drawing from network science, the purpose of this paper is to examine whether high and low quality UGC differ in their connectivity structures in Wikipedia (which consists of interconnected user generated articles). Design/methodology/approach - Using Featured Articles as a proxy for high quality, a network analysis was undertaken of the revision history of six different language Wikipedias, to offer a network-centric explanation for the emergence of quality in UGC. Findings - The network structure of interactions between articles and contributors plays an important role in the emergence of quality. Specifically the analysis reveals that high-quality articles cluster in hubs that span structural holes. Research limitations/implications - The analysis does not capture the strength of interactions between articles and contributors. The implication of this limitation is that quality is viewed as a binary variable. Extensions to this research will relate strength of interactions to different levels of quality in UGC. Practical implications - The findings help harness the "wisdom of the crowds" effectively. Organisations should nurture users and articles at the structural hubs from an early stage. This can be done through appropriate design of collaborative knowledge systems and development of organisational policies to empower hubs. Originality/value - The network centric perspective on quality in UGC and the use of a dynamic modelling tool are novel. The paper is of value to researchers in the area of social computing and to practitioners implementing and maintaining such platforms in organisations. Copyright 0 0
News auto-tagging using Wikipedia News auto-tagging using Wikipedia Eldin S.S.
El-Beltagy S.R.
2013 9th International Conference on Innovations in Information Technology, IIT 2013 English 2013 This paper presents an efficient method for automatically annotating Arabic news stories with tags using Wikipedia. The idea of the system is to use Wikipedia article names, properties, and re-directs to build a pool of meaningful tags. Sophisticated and efficient matching methods are then used to detect text fragments in input news stories that correspond to entries in the constructed tag pool. Generated tags represent real life entities or concepts such as the names of popular places, known organizations, celebrities, etc. These tags can be used indirectly by a news site for indexing, clustering, classification, statistics generation or directly to give a news reader an overview of news story contents. Evaluation of the system has shown that the tags it generates are better than those generated by MSN Arabic news. 0 0
Object recognition in wikimage data based on local invariant image features Object recognition in wikimage data based on local invariant image features Tomasev N.
Pracner D.
Brehar R.
Radovanovic M.
Mladenic D.
Ivanovic M.
Nedevschi S.
Proceedings - 2013 IEEE 9th International Conference on Intelligent Computer Communication and Processing, ICCP 2013 English 2013 Object recognition is an essential task in content-based image retrieval and classification. This paper deals with object recognition in WIKImage data, a collection of publicly available annotated Wikipedia images. WIKImage comprises a set of 14 binary classification problems with significant class imbalance. Our approach is based on using the local invariant image features and we have compared 3 standard and widely used feature types: SIFT, SURF and ORB. We have examined how the choice of representation affects the k-nearest neighbor data topology and have shown that some feature types might be more appropriate than others for this particular problem. In order to assess the difficulty of the data, we have evaluated 7 different k-nearest neighbor classification methods and shown that the recently proposed hubness-aware classifiers might be used to either increase the accuracy of prediction, or the macro-averaged F-score. However, our results indicate that further improvements are possible and that including the textual feature information might prove beneficial for system performance. 0 0
On detecting Association-Based Clique Outliers in heterogeneous information networks On detecting Association-Based Clique Outliers in heterogeneous information networks Gupta M.
Gao J.
Yan X.
Cam H.
Jangwhan Han
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013 English 2013 In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. People like to discover groups (or cliques) of entities linked to each other with rare and surprising associations from such networks. We define such anomalous cliques as Association-Based Clique Outliers (ABCOutliers) for heterogeneous information networks, and design effective approaches to detect them. The need to find such outlier cliques from networks can be formulated as a conjunctive select query consisting of a set of (type, predicate) pairs. Answering such conjunctive queries efficiently involves two main challenges: (1) computing all matching cliques which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the cliques. In this paper, we address these two challenges as follows. First, we introduce a new low-cost graph index to assist clique matching. Second, we define the outlierness of an association between two entities based on their attribute values and provide a methodology to efficiently compute such outliers given a conjunctive select query. Experimental results on several synthetic datasets and the Wikipedia dataset containing thousands of entities show the effectiveness of the proposed approach in computing interesting ABCOutliers. Copyright 2013 ACM. 0 0
On the social nature of linguistic prescriptions On the social nature of linguistic prescriptions Milkowski M. Psychology of Language and Communication English 2013 The paper proposes an empirical method to investigate linguistic prescriptions as inherent corrective behaviors. The behaviors in question may but need not necessarily be supported by any explicit knowledge of rules. It is possible to gain insight into them, for example by extracting information about corrections from revision histories of texts (or by analyzing speech corpora where users correct themselves or one another). One easily available source of such information is the revision history of Wikipedia. As is shown, the most frequent and short corrections are limited to linguistic errors such as typos (and editorial conventions adopted in Wikipedia). By perusing an automatically generated revision corpus, one gains insight into the prescriptive nature of language empirically. At the same time, the prescriptions offered are not reducible to descriptions of the most frequent linguistic use. 0 0
Ontology-enriched multi-document summarization in disaster management using submodular function Ontology-enriched multi-document summarization in disaster management using submodular function Wu K.
Li L.
Jing-Woei Li
Li T.
Information Sciences English 2013 In disaster management, a myriad of news and reports relevant to the disaster may be recorded in the form of text document. A challenging problem is how to provide concise and informative reports from a large collection of documents, to help domain experts analyze the trend of the disaster. In this paper, we explore the feasibility of using a domain-specific ontology to facilitate the summarization task, and propose TELESUM, an ontology-enriched multi-document summarization approach, where the submodularity hidden in among ontological concepts is investigated. Empirical experiments on the collection of press releases by Miami-Dade County Department of Emergency Management during Hurricane Wilma in 2005 demonstrate the efficacy and effectiveness of TELESUM in disaster management. Further, our proposed framework can be extended to summarizing general documents by employing public ontologies, e.g.; Wikipedia. Extensive evaluation on the generalized framework is conducted on DUC04-05 datasets, and shows that our method is competitive with other approaches. © 2012 Elsevier Inc. All rights reserved. 0 0
Open innovation and distributed knowledge: An analysis of their characteristics and prosumers' motives Open innovation and distributed knowledge: An analysis of their characteristics and prosumers' motives Gherab-Martin K.
Satrustegui A.U.
Knowledge management English 2013 Starting with examples of successful open innovation and distributed knowledge on the Internet, this paper deals with the motives behind the sometimes seemingly disinterested participation of many of the users or prosumers who generate knowledge and innovations. This will necessitate, in addition, an analysis of best practices for improving the efficiency of open and distributed innovation. We will present the case of InnoCentive in detail and make brief incursions into two other examples of distributed knowledge: open source software and Wikipedia. 0 0
Open2Edit: A peer-to-peer platform for collaboration Open2Edit: A peer-to-peer platform for collaboration Zeilemaker N.
Capota M.
Pouwelse J.
2013 IFIP Networking Conference, IFIP Networking 2013 English 2013 Peer-to-peer systems owe much of their success to user contributed resources like storage space and bandwidth. At the same time, popular collaborative systems like Wikipedia or StackExchange are built around user-contributed knowledge, judgement, and expertise. In this paper, we combine peer-to-peer and collaborative systems to create Open2Edit, a peer-to-peer platform for collaboration. 0 0
OpenSeaMap - The free nautical chart OpenSeaMap - The free nautical chart Barlocher M. Hydro International English 2013 OpenSeaMap involves experienced mariners, programmers and thousands of data collectors, all of whom are working to produce a nautical chart with comprehensive, relevant and up-to-date data for water sports which is open to everyone and free of charge. OpenSeaMap works like Wikipedia, the upto-date, competent and most comprehensive encyclopedia in the world. Thousands of skippers, divers, kayakers, and other water-sports enthusiasts compile information they consider important and useful for a nautical chart and save them in a spatial database. OpenSeaMap is the fastest chart in the world. Items such as buoys that have been moved, a new harbor or the harbor master's new telephone number can be found online within just a few minutes, instead of one year later in the next edition of a common harbor pilot book. OpenSeaMap is versatile. It contains information on oceans, rivers and topography. 0 0
Opinions, conflicts, and consensus: Modeling social dynamics in a collaborative environment Opinions, conflicts, and consensus: Modeling social dynamics in a collaborative environment Torok J.
Iniguez G.
Taha Yasseri
San Miguel M.
Kaski K.
Kertesz J.
Physical Review Letters English 2013 Information-communication technology promotes collaborative environments like Wikipedia where, however, controversy and conflicts can appear. To describe the rise, persistence, and resolution of such conflicts, we devise an extended opinion dynamics model where agents with different opinions perform a single task to make a consensual product. As a function of the convergence parameter describing the influence of the product on the agents, the model shows spontaneous symmetry breaking of the final consensus opinion represented by the medium. In the case when agents are replaced with new ones at a certain rate, a transition from mainly consensus to a perpetual conflict occurs, which is in qualitative agreement with the scenarios observed in Wikipedia. 0 0
Parsit at Evalita 2011 dependency parsing task Parsit at Evalita 2011 dependency parsing task Grella M.
Nicola M.
Lecture Notes in Computer Science English 2013 This article describes the Constraint-based Dependency Parser architecture used at Evalita 2011 Dependency Parsing Task, giving a detailed analysis of the results obtained at the official evaluation. The Italian grammar has been expressed for the first time as a set of constraints and an ad-hoc constraints solver has been then applied to restrict possible analysis. Multiple solutions of a given sentence have been reduced to one by means of an evidence scoring system that makes use of an indexed version of Italian Wikipedia created for the purpose. The attachment score obtained is 96.16%, giving the best result so far for a dependency parser for the Italian language. 0 0
Personality traits and knowledge sharing in online communities Personality traits and knowledge sharing in online communities Jadin T.
Gnambs T.
Batinic B.
Computers in Human Behavior English 2013 Adopting diffusion theory and the concept of social value orientation, the effects of personality traits on knowledge sharing in a virtual open content community are investigated. In addition to the main effects of personality, it was hypothesized that intrinsic motivations would moderate the effects on knowledge sharing. A sample of N = 256 active users of Wikipedia provided measures of personality, motivation, and knowledge sharing. Latent regression analyses support the notion that authorship of Wikipedia is associated with higher levels of trendsetting and a prosocial value orientation. Moreover, moderation analyses demonstrate that the effect of the latter is moderated by individual differences in motivations to write. Differences with regard to opinion leadership could not be confirmed. © 2012 Elsevier Ltd. All rights reserved. 0 0
Perspectives on crowdsourcing annotations for natural language processing Perspectives on crowdsourcing annotations for natural language processing Wang A.
Hoang C.D.V.
Kan M.-Y.
Language Resources and Evaluation English 2013 Crowdsourcing has emerged as a new method for obtaining annotations for training models for machine learning. While many variants of this process exist, they largely differ in their methods of motivating subjects to contribute and the scale of their applications. To date, there has yet to be a study that helps the practitioner to decide what form an annotation application should take to best reach its objectives within the constraints of a project. To fill this gap, we provide a faceted analysis of crowdsourcing from a practitioner's perspective, and show how our facets apply to existing published crowdsourced annotation applications. We then summarize how the major crowdsourcing genres fill different parts of this multi-dimensional space, which leads to our recommendations on the potential opportunities crowdsourcing offers to future annotation efforts. © 2012 Springer Science+Business Media B.V. 0 0
Processing Business News for Detecting Firms' Global Networking Strategies Processing Business News for Detecting Firms' Global Networking Strategies Gay B. Competitive Intelligence 2.0: Organization, Innovation and Territory English 2013 [No abstract available] 0 0
Querying multilingual DBpedia with QAKiS Querying multilingual DBpedia with QAKiS Cabrio E.
Cojan J.
Fabien Gandon
Hallili A.
Lecture Notes in Computer Science English 2013 We present an extension of QAKiS, a system for open domain Question Answering over linked data, that allows to query DBpedia multilingual chapters. Such chapters can contain different information with respect to the English version, e.g. they provide more specificity on certain topics, or fill information gaps. QAKiS exploits the alignment between properties carried out by DBpedia contributors as a mapping from Wikipedia terms to a common ontology, to exploit information coming from DBpedia multilingual chapters, broadening therefore its coverage. For the demo, English, French and German DBpedia chapters are the RDF data sets to be queried using a natural language interface. 0 0
Recommending tags with a model of human categorization Recommending tags with a model of human categorization Seitlinger P.
Kowald D.
Christoph Trattner
Tobias Ley
International Conference on Information and Knowledge Management, Proceedings English 2013 When interacting with social tagging systems, humans exercise complex processes of categorization that have been the topic of much research in cognitive science. In this paper we present a recommender approach for social tags derived from ALCOVE, a model of human category learning. The basic architecture is a simple three-layers connectionist model. The input layer encodes patterns of semantic features of a user-specific resource, such as latent topics elicited through Latent Dirichlet Allocation (LDA) or available external categories. The hidden layer categorizes the resource by matching the encoded pattern against already learned exemplar patterns. The latter are composed of unique feature patterns and associated tag distributions. Finally, the output layer samples tags from the associated tag distributions to verbalize the preceding categorization process. We have evaluated this approach on a real-world folksonomy gathered from Wikipedia bookmarks in Delicious. In the experiment our approach outperformed LDA, a well-established algorithm. We attribute this to the fact that our approach processes semantic information (either latent topics or external categories) across the three different layers. With this paper, we demonstrate that a theoretically guided design of algorithms not only holds potential for improving existing recommendation mechanisms, but it also allows us to derive more generalizable insights about how human information interaction on the Web is determined by both semantic and verbal processes. Copyright 2013 ACM. 0 0
Representation and verification of attribute knowledge Representation and verification of attribute knowledge Zhang C.
Niu Z.
Shi C.
Tan M.
Fu H.
Xu S.
Lecture Notes in Computer Science English 2013 With the increasing growth and popularization of the Internet, knowledge extraction from the web is an important issue in the fields of web mining, ontology engineering and intelligent information processing. The availability of real big corpora and the development of technologies of internet network and machine learning make it feasible to acquire massive knowledge from the web. In addition, many web-based encyclopedias such as Wikipedia and Baidu Baike include much structured knowledge. However, knowledge qualities including the incorrectness, inconsistency, and incompleteness become a serious obstacle for the wide practical applications of those extracted and structured knowledge. In this paper, we build a taxonomy of relations between attributes of concepts, and propose a taxonomy of attribute relations driven approach to evaluating the knowledge about attribute values of attributes of entities. We also address an application of our approach to building and verifying attribute knowledge of entities in different domains. 0 0
Research on measuring semantic correlation based on the Wikipedia hyperlink network Research on measuring semantic correlation based on the Wikipedia hyperlink network Ye F.
Zhang F.
Luo X.
Xu L.
2013 IEEE/ACIS 12th International Conference on Computer and Information Science, ICIS 2013 - Proceedings English 2013 As a free online encyclopedia with a large-scale of knowledge coverage, rich semantic information and quick update speed, Wikipedia brings new ideas to measure semantic correlation. In this paper, we present a new method for measuring the semantic correlation between words by mining rich semantic information that exists in Wikipedia. Unlike the previous methods that calculate semantic relatedness merely based on the page network or the category network, our method not only takes into account the semantic information of the page network, also combines the semantic information of the category network, and it improve the accuracy of the results. Besides, we analyze and evaluate the algorithm by comparing the calculation results with famous knowledge base (e.g., Hownet) and traditional methods based on Wikipedia on the same test set, and prove its superiority. 0 0
ResourceSync: Leveraging sitemaps for resource synchronization ResourceSync: Leveraging sitemaps for resource synchronization Haslhofer B.
Warner S.
Lagoze C.
Max Klein
Sanderson R.
Nelson M.L.
Van De Sompel H.
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 Many applications need up-to-date copies of collections of changing Web resources. Such synchronization is currently achieved using ad-hoc or proprietary solutions. We propose ResourceSync, a general Web resource synchronization pro- tocol that leverages XML Sitemaps. It provides a set of capabilities that can be combined in a modular manner to meet local or community requirements. We report on work to implement this protocol for arXiv.org and also provide an experimental prototype for the English Wikipedia as well as a client API. 0 0
Revision graph extraction in wikipedia based on supergram decomposition Revision graph extraction in wikipedia based on supergram decomposition Wu J.
Mizuho Iwaihara
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 As one of the popular social media that many people turn to in recent years, collaborative encyclopedia Wikipedia provides information in a more "Neutral Point of View" way than others. Towards this core principle, plenty of efforts have been put into collaborative contribution and editing. The trajectories of how such collaboration appears by revisions are valuable for group dynamics and social media research, which suggest that we should extract the underlying derivation relationships among revisions from chronologically-sorted revision history in a precise way. In this paper, we propose a revision graph extraction method based on supergram decomposition in the document collection of near-duplicates. The plain text of revisions would be measured by its frequency distribution of supergram, which is the variable-length token sequence that keeps the same through revisions. We show that this method can effectively perform the task than existing methods. Categories and Subject Descriptors K.4.3 [Computers and Society]: Organizational Impacts - Computer-supported collaborative work. General Terms Algorithms, Experimentation. Copyright 2010 ACM. 0 0
Risk factors and control of hospital acquired infections: A comparison between Wikipedia and scientific literature Risk factors and control of hospital acquired infections: A comparison between Wikipedia and scientific literature Maggi E.
Magistrelli L.
Zavattaro M.
Beggiato M.
Maiello F.
Naturale C.
Ragliani M.
Varalda M.
Viola M.S.
Concina D.
Allara E.
Faggiano F.
Epidemiology Biostatistics and Public Health English 2013 Background: nowadays Wikipedia is one of the main on-line sources of general information. It contains several items about nosocomial infections and their prevention, together of items on virtually every scientific topic. This study aims to assess whether Wikipedia can be considered a reliable source for professional updating, concerning Healthcare-associated Infections (HAI). Methods: Wikipedia has been searched in order to gather items on HAI. 387 items were found with a search string. The field of research was reduced at those articles (27 items) containing exhaustive information in relation to prevention of HAI. The messages contained in those articles were than compared with the recommendations of a selected guideline (NICE 2003), completed by a literature search, with the aim of testing their reliability and exhaustivity. Results: 15 Wiki items were found and 51 messages selected. NICE guidelines contained 119 recommendations and 52 more recommendations has been found in a further literature search. 45.1% of Wikipedia's messages were even found in the guidelines. On this percentage, 21.6% completely agreed with the messages of the guidelines, 15.7% partially agreed, 3.9% disagreed and 3.9% showed different level of evidence in different articles. Moreover, 54.9% of Wikipedia's messages were not included in the guidelines and 84.2% of the recommendations contained in the guidelines were not present in Wikipedia. ConclusionS: Wikipedia should not be considered as a reliable source for professional updating on HAI. 0 0
Search in WikiImages using mobile phone Search in WikiImages using mobile phone Havasi L.
Szabo M.
Pataki M.
Varga D.
Sziranyi T.
Kovacs L.
Proceedings - International Workshop on Content-Based Multimedia Indexing English 2013 Demonstration will focus on the content based retrieval of Wikipedia images (Hungarian version). A mobile application for iOS will be used to gather images and send directly to the crossmodal processing framework. Searching is implemented in a high performance hybrid index tree with total 500k entries. The hit list is converted to wikipages and ordered by the content based score. 0 0
Searching for Translated Plagiarism with the Help of Desktop Grids Searching for Translated Plagiarism with the Help of Desktop Grids Pataki M.
Marosi A.C.
Journal of Grid Computing English 2013 Translated or cross-lingual plagiarism is defined as the translation of someone else's work or words without marking it as such or without giving credit to the original author. The existence of cross-lingual plagiarism is not new, but only in recent years, due to the rapid development of the natural language processing, appeared the first algorithms which tackled the difficult task of detecting it. Most of these algorithms utilize machine translation to compare texts written in different languages. We propose a different method, which can effectively detect translations between language-pairs where machine translations still produce low quality results. Our new algorithm presented in this paper is based on information retrieval (IR) and a dictionary based similarity metric. The preprocessing of the candidate documents for the IR is computationally intensive, but easily parallelizable. We propose a desktop Grid solution for this task. As the application is time sensitive and the desktop Grid peers are unreliable, a resubmission mechanism is used which assures that all jobs of a batch finish within a reasonable time period without dramatically increasing the load on the whole system. © 2012 Springer Science+Business Media B.V. 0 0
Searching for interestingness in wikipedia and yahoo! answers Searching for interestingness in wikipedia and yahoo! answers Mejova Y.
Bordino I.
Lalmas M.
Aristides Gionis
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 In many cases, when browsing the Web, users are searching for specific information. Sometimes, though, users are also looking for something interesting, surprising, or entertain- ing. Serendipitous search puts interestingness on par with relevance. We investigate how interesting are the results one can obtain via serendipitous search, and what makes them so, by comparing entity networks extracted from two promi- nent social media sites, Wikipedia and Yahoo Answers. 0 0
Selecting features with SVM Selecting features with SVM Rzeniewicz J.
Szymanski J.
Lecture Notes in Computer Science English 2013 A common problem with feature selection is to establish how many features should be retained at least so that important information is not lost. We describe a method for choosing this number that makes use of Support Vector Machines. The method is based on controlling an angle by which the decision hyperplane is tilt due to feature selection. Experiments were performed on three text datasets generated from a Wikipedia dump. Amount of retained information was estimated by classification accuracy. Even though the method is parametric, we show that, as opposed to other methods, once its parameter is chosen it can be applied to a number of similar problems (e.g. one value can be used for various datasets originating from Wikipedia). For a constant value of the parameter, dimensionality was reduced by from 78% to 90%, depending on the data set. Relative accuracy drop due to feature removal was less than 0.5% in those experiments. 0 0
Semantic Web service discovery based on FIPA multi agents Semantic Web service discovery based on FIPA multi agents Song W. Lecture Notes in Electrical Engineering English 2013 In this paper we propose a framework for semantic Web service discovery that communicates between multi agent system and Web services without changing their existing specifications and implementations by providing a broker. We explained that the ontology management in the broker creates the user ontology and merges it with general ontology (i.e. WordNet, Yago, Wikipedia ...) and recommends the created WSDL based on generalized ontology to selected Web service provider to increase their retrieval probability in the related queries. In the future works, we solve inconsistencies during the merge and will improve matching process and will implement the recommendation component. 0 0
Semantic message passing for generating linked data from tables Semantic message passing for generating linked data from tables Mulwad V.
Tim Finin
Joshi A.
Lecture Notes in Computer Science English 2013 We describe work on automatically inferring the intended meaning of tables and representing it as RDF linked data, making it available for improving search, interoperability and integration. We present implementation details of a joint inference module that uses knowledge from the linked open data (LOD) cloud to jointly infer the semantics of column headers, table cell values (e.g., strings and numbers) and relations between columns. We also implement a novel Semantic Message Passing algorithm which uses LOD knowledge to improve existing message passing schemes. We evaluate our implemented techniques on tables from the Web and Wikipedia. 0 0
Semantic smoothing for text clustering Semantic smoothing for text clustering Nasir J.A.
Varlamis I.
Karim A.
Tsatsaronis G.
Knowledge-Based Systems English 2013 In this paper we present a new semantic smoothing vector space kernel (S-VSM) for text documents clustering. In the suggested approach semantic relatedness between words is used to smooth the similarity and the representation of text documents. The basic hypothesis examined is that considering semantic relatedness between two text documents may improve the performance of the text document clustering task. For our experimental evaluation we analyze the performance of several semantic relatedness measures when embedded in the proposed (S-VSM) and present results with respect to different experimental conditions, such as: (i) the datasets used, (ii) the underlying knowledge sources of the utilized measures, and (iii) the clustering algorithms employed. To the best of our knowledge, the current study is the first to systematically compare, analyze and evaluate the impact of semantic smoothing in text clustering based on 'wisdom of linguists', e.g., WordNets, 'wisdom of crowds', e.g., Wikipedia, and 'wisdom of corpora', e.g., large text corpora represented with the traditional Bag of Words (BoW) model. Three semantic relatedness measures for text are considered; two knowledge-based (Omiotis [1] that uses WordNet, and WLM [2] that uses Wikipedia), and one corpus-based (PMI [3] trained on a semantically tagged SemCor version). For the comparison of different experimental conditions we use the BCubed F-Measure evaluation metric which satisfies all formal constraints of good quality cluster. The experimental results show that the clustering performance based on the S-VSM is better compared to the traditional VSM model and compares favorably against the standard GVSM kernel which uses word co-occurrences to compute the latent similarities between document terms. © 2013 Elsevier B.V. All rights reserved. 0 0
Sense clustering using Wikipedia Sense clustering using Wikipedia Dandala B.
Hokamp C.
Rada Mihalcea
Bunescu R.C.
International Conference Recent Advances in Natural Language Processing, RANLP English 2013 In this paper, we propose a novel method for generating a coarse-grained sense inventory from Wikipedia using a machine learning framework. Structural and content-based features are employed to induce clusters of articles representative of a word sense. Additionally, multilingual features are shown to improve the clustering accuracy, especially for languages that are less comprehensive than English. We show the effectiveness of our clustering methodology by testing it against both manually and automatically annotated datasets. 0 0
Short text classification using wikipedia concept based document representation Short text classification using wikipedia concept based document representation Xiaolong Wang
Chen R.
Jia Y.
Zhou B.
Proceedings - 2013 International Conference on Information Technology and Applications, ITA 2013 English 2013 Short text classification is a difficult and challenging task in information retrieval systems since the text data is short, sparse and multidimensional. In this paper, we represent short text with Wikipedia concepts for classification. Short document text is mapped to Wikipedia concepts and the concepts are then used to represent document for text categorization. Traditional methods for classification such as SVM can be used to perform text categorization on the Wikipedia concept document representation. Experimental evaluation on real Google search snippets shows that our approach outperforms the traditional BOW method and gives good performance. Although it's not better than the state-of-the-art classifier (see e.g. Phan et al. WWW '08), our method can be easily implemented with low cost. 0 0
Similarities, challenges and opportunities of wikipedia content and open source projects Similarities, challenges and opportunities of wikipedia content and open source projects Capiluppi A. Journal of software: Evolution and Process English 2013 Several years of research and evidence have demonstrated that open source software portals often contain a large amount of software projects that simply do not evolve, developed by relatively small communities, struggling to attract a sustained number of contributors. These portals have started to increasingly act as a storage for abandoned projects, and researchers and practitioners should try and point out how to take advantage of such content. Similarly, other online content portals (like Wikipedia) could be harvested for valuable content. In this paper we argue that, even with differences in the requested expertise, many projects reliant on content and contributions by users undergo a similar evolution, and follow similar patterns: when a project fails to attract contributors, it appears to be not evolving, or abandoned. Far from a negative finding, even those projects could provide valuable content that should be harvested and identified based on common characteristics: by using the attributes of 'usefulness' and 'modularity' we isolate valuable content in both Wikipedia pages and open source software projects. Copyright 0 0
Size matters (spacing not): 18 points for a dyslexic-friendly wikipedia Size matters (spacing not): 18 points for a dyslexic-friendly wikipedia Rello L.
Pielot M.
Marcos M.-C.
Carlini R.
W4A 2013 - International Cross-Disciplinary Conference on Web Accessibility English 2013 In 2012, Wikipedia was the sixth-most visited website on the Internet. Being one of the main repositories of knowledge, students from all over the world consult it. But, around 10% of these students have dyslexia, which impairs their access to text-based websites. How could Wikipedia be presented to be more readable for this target group? In an experiment with 28 participants with dyslexia, we compare reading speed, comprehension, and subjective readability for the font sizes 10, 12, 14, 18, 22, and 26 points, and line spacings 0.8, 1.0, 1.4, and 1.8. The results show that font size has a significant effect on the readability and the understandability of the text, while line spacing does not. On the basis of our results, we recommend using 18-point font size when designing web text for readers with dyslexia. Our results significantly differ from previous recommendations, presumably, because this is the first work to cover a wide range of values and to study them in the context of an actual website. Copyright 2013 ACM. 0 0
SmartWiki: A reliable and conflict-refrained Wiki model based on reader differentiation and social context analysis SmartWiki: A reliable and conflict-refrained Wiki model based on reader differentiation and social context analysis Haifeng Zhao
Kallander W.
Johnson H.
Wu S.F.
Knowledge-Based Systems English 2013 Wiki systems, such as Wikipedia, provide a multitude of opportunities for large-scale online knowledge collaboration. Despite Wikipedia's successes with the open editing model, dissenting voices give rise to unreliable content due to conflicts amongst contributors. Frequently modified controversial articles by dissent editors hardly present reliable knowledge. Some overheated controversial articles may be locked by Wikipedia administrators who might leave their own bias in the topic. It could undermine both the neutrality and freedom policies of Wikipedia. As Richard Rorty suggested "Take Care of Freedom and Truth Will Take Care of Itself"[1], we present a new open Wiki model in this paper, called TrustWiki, which bridge readers closer to the reliable information while allowing editors to freely contribute. From our perspective, the conflict issue results from presenting the same knowledge to all readers, without regard for the difference of readers and the revealing of the underlying social context, which both causes the bias of contributors and affects the knowledge perception of readers. TrustWiki differentiates two types of readers, "value adherents" who prefer compatible viewpoints and "truth diggers" who crave for the truth. It provides two different knowledge representation models to cater for both types of readers. Social context, including social background and relationship information, is embedded in both knowledge representations to present readers with personalized and credible knowledge. To our knowledge, this is the first paper on knowledge representation combining both psychological acceptance and truth reveal to meet the needs of different readers. Although this new Wiki model focuses on reducing conflicts and reinforcing the neutrality policy of Wikipedia, it also casts light on the other content reliability problems in Wiki systems, such as vandalism and minority opinion suppression. © 2013 Elsevier B.V. All rights reserved. 0 0
Social computing: Its evolving definition and modeling in the context of collective intelligence Social computing: Its evolving definition and modeling in the context of collective intelligence Yoshifumi Masunaga Proceedings of the 2012 ASE International Conference on Social Informatics, SocialInformatics 2012 English 2013 gSocial computingh is a keyword in contemporary society. However, if we ask anew what the term social computing means, we realize that its definition, meaning and modeling have not necessarily been clarified. This paper first investigates when questions are raised about gsocial computing.h We find that the oldest Wikipedia article on social computing was written on January 21, 2005. However, it is found that a major rewrite was done on October 17, 2007, which caused a great change in its definition. It seems that the reason for this change is the idea of collective intelligence that has been popularized in James Surowiecki's book, The Wisdom of Crowds. In order to examine how the concept of social computing is accepted by and has infiltrated the web society, we performed an analysis of the search engine results page (SERP) using Google, specifying the search keyword as gsocial computing.h This paper investigates a formal model of social computing, which is described in contrast with the traditional computing scheme. Based on this model, we investigate the relationship between social computing and computer science, and we conclude that the Wikipedia article on social computing that states gsocial computing is a general term for an area of computer science ch is inaccurate. 0 0
Social content authoring with no 'social traps' Social content authoring with no 'social traps' Boubas A.Y.
Harous S.
Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, CTS 2013 English 2013 We are currently building a content authoring system, and we present in this paper some preliminary ideas based on which the system is being developed. The proposed system takes advantages of key aspects that made online content platforms succeed (such as Wikipedia). These aspects, such as immediacy of positive outcome, are then combined with social motivators and social organization enablers via a digital social networking framework. The resulting environment enables, encourages and rewards authoring of higher quality content in academic institutions. 0 0
Spred: Large-scale harvesting of semantic predicates Spred: Large-scale harvesting of semantic predicates Flati T.
Roberto Navigli
ACL 2013 - 51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference English 2013 We present SPred, a novel method for the creation of large repositories of semantic predicates. We start from existing collocations to form lexical predicates (e.g., break *) and learn the semantic classes that best fit the * argument. To do this, we extract all the occurrences in Wikipedia which match the predicate and abstract its arguments to general semantic classes (e.g., break Body Part, break Agreement, etc.). Our experiments show that we are able to create a large collection of semantic predicates from the Oxford Advanced Learner's Dictionary with high precision and recall, and perform well against the most similar approach. 0 0
Streaming big data with self-adjusting computation Streaming big data with self-adjusting computation Acar U.A.
Yirong Chen
DDFP 2013 - Proceedings of the 2013 ACM SIGPLAN Workshop on Data Driven Functional Programming, Co-located with POPL 2013 English 2013 Many big data computations involve processing data that changes incrementally or dynamically over time. Using existing techniques, such computations quickly become impractical. For example, computing the frequency of words in the first ten thousand paragraphs of a publicly available Wikipedia data set in a streaming fashion using MapReduce can take as much as a full day. In this paper, we propose an approach based on self-adjusting computation that can dramatically improve the efficiency of such computations. As an example, we can perform the aforementioned streaming computation in just a couple of minutes. Copyright 0 0
Students' digital strategies and shortcuts Students' digital strategies and shortcuts Blikstad-Balas M.
Hvistendahl R.
Nordic Journal of Digital Literacy English 2013 When the classroom is connected to the Internet, the number of possible sources of information is almost infinite. Nevertheless, students tend to systematically favor the online encyclopedia Wikipedia as a source for knowledge. The present study combines quantitative and qualitative data to investigate the role Wikipedia plays in the literacy practices of students working on school tasks. It also discusses how different tasks lead to different strategies. 0 0
Students’ Digital Strategies and Shortcuts – Searching for Answers on Wikipedia as a Core Literacy Practice in Upper Secondary School Students’ Digital Strategies and Shortcuts – Searching for Answers on Wikipedia as a Core Literacy Practice in Upper Secondary School Marte Blikstad-Balas & Rita Hvistendahl (Nordic Journal of Digital Literacy. 2013 , issue 01/02:32-48) ISSN: 1891-943X 2013 ABSTRACT:When the classroom is connected to the Internet, the number of possible sources of information is almost infinite. Nevertheless, students tend to systematically favor the online encyclopedia Wikipedia as a source for knowledge. The present study combines quantitative and qualitative data to investigate the role Wikipedia plays in the literacy practices of students working on school tasks. It also discusses how different tasks lead to different strategies. 0 0
Sustainability of Open Collaborative Communities: Analyzing Recruitment Efficiency Sustainability of Open Collaborative Communities: Analyzing Recruitment Efficiency Kevin Crowston
Nicolas Jullien
Felipe Ortega
Technology Innovation Management Review January 2013 0 0
Symbiotic coupling of P2P and cloud systems: The Wikipedia case Symbiotic coupling of P2P and cloud systems: The Wikipedia case Bremer L.
Graffi K.
IEEE International Conference on Communications English 2013 Cloud computing offers high availability, dynamic scalability, and elasticity requiring only very little administration. However, this service comes with financial costs. Peer-to-peer systems, in contrast, operate at very low costs but cannot match the quality of service of the cloud. This paper focuses on the case study of Wikipedia and presents an approach to reduce the operational costs of hosting similar websites in the cloud by using a practical peer-to-peer approach. The visitors of the site are joining a Chord overlay, which acts as first cache for article lookups. Simulation results show, that up to 72% of the article lookups in Wikipedia could be answered by other visitors instead of using the cloud. 0 0
Talking topically to artificial dialog partners: Emulating humanlike topic awareness in a virtual agent Talking topically to artificial dialog partners: Emulating humanlike topic awareness in a virtual agent Alexa Breuing
Ipke Wachsmuth
Communications in Computer and Information Science English 2013 During dialog, humans are able to track ongoing topics, to detect topical shifts, to refer to topics via labels, and to decide on the appropriateness of potential dialog topics. As a result, they interactionally produce coherent sequences of spoken utterances assigning a thematic structure to the whole conversation. Accordingly, an artificial agent that is intended to engage in natural and sophisticated human-agent dialogs should be endowed with similar conversational abilities. This paper presents how to enable topically coherent conversations between humans and interactive systems by emulating humanlike topic awareness in the virtual agent Max. Therefore, we firstly realized automatic topic detection and tracking on the basis of contextual knowledge provided by Wikipedia and secondly adapted the agent's conversational behavior by means of the gained topic information. As a result, we contribute to improve human-agent dialogs by enabling topical talk between human and artificial interlocutors. This paper is a revised and extended version of [1]. 0 0
Tea & sympathy: Crafting positive new user experiences on wikipedia Tea & sympathy: Crafting positive new user experiences on wikipedia Morgan J.T.
Bouterse S.
Stierch S.
Walls H.
English 2013 We present the Teahouse, a pilot project for supporting and socializing new Wikipedia editors. Open collaboration systems like Wikipedia must continually recruit and retain new members in order to sustain themselves. Wikipedia's editor decline presents unique exigency for evaluating novel strategies to support newcomers and increase new user retention in such systems, particularly among demographics that are currently underrepresented in the user community. In this paper, we describe the design and deployment of Teahouse, and present preliminary findings. Our findings highlight the importance of intervening early in the editor lifecycle, providing user-friendly tools, creating safe spaces for newcomers, and facilitating positive interactions between newcomers and established community members. Copyright 2013 ACM. 0 0
Tell me more: An actionable quality model for wikipedia Tell me more: An actionable quality model for wikipedia Morten Warncke-Wang
Dan Cosley
John Riedl
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 In this paper we address the problem of developing actionable quality models for Wikipedia, models whose features directly suggest strategies for improving the quality of a given article. We rst survey the literature in order to understand the notion of article quality in the context of Wikipedia and existing approaches to automatically assess article quality. We then develop classication models with varying combinations of more or less actionable features, and nd that a model that only contains clearly actionable features delivers solid performance. Lastly we discuss the implications of these results in terms of how they can help improve the quality of articles across Wikipedia. Categories and Subject Descriptors H.5 [Information Interfaces and Presentation]: Group and Organization InterfacesCollaborative computing, Computer-supported cooperative work, Web-based interac- Tion. Copyright 2010 ACM. 0 0
Temporal analysis of activity patterns of editors in collaborative mapping project of openstreetmap Temporal analysis of activity patterns of editors in collaborative mapping project of openstreetmap Taha Yasseri
Giovanni Quattrone
Afra Mashhadi
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 In the recent yearsWikis have become an attractive platform for social studies of the human behaviour. Containing mil- lions records of edits across the globe, collaborative systems such as Wikipedia have allowed researchers to gain a bet- Ter understanding of editors participation and their activity patterns. However, contributions made to Geo-wikis wiki- based collaborative mapping projects dier from systems such as Wikipedia in a fundamental way due to spatial di- mension of the content that limits the contributors to a set of those who posses local knowledge about a specic area and therefore cross-platform studies and comparisons are required to build a comprehensive image of online open col- laboration phenomena. In this work, we study the temporal behavioural pattern of OpenStreetMap editors, a successful example of geo-wiki, for two European capital cities. We categorise dierent type of temporal patterns and report on the historical trend within a period of 7 years of the project age. We also draw a comparison with the previously ob- served editing activity patterns of Wikipedia. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications Spatial Databases and GIS; H.5.3 [Group and Orga- nization Interfaces]: Collaborative computing, computer- supported cooperative work General Terms Human Factors, Measurement. Copyright 2010 ACM. 0 0
Temporal summarization of event-related updates in wikipedia Temporal summarization of event-related updates in wikipedia Georgescu M.
Pham D.D.
Kanhabua N.
Zerr S.
Siersdorfer S.
Wolfgang Nejdl
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 Wikipedia is a free multilingual online encyclopedia cover- ing a wide range of general and specific knowledge. Its con- tent is continuously maintained up-to-date and extended by a supporting community. In many cases, real-world events inuence the collaborative editing of Wikipedia articles of the involved or affected entities. In this paper, we present Wikipedia Event Reporter, a web-based system that sup- ports the entity-centric, temporal analytics of event-related information in Wikipedia by analyzing the whole history of article updates. For a given entity, the system first identifies peaks of update activities for the entity using burst detec- tion and automatically extracts event-related updates using a machine-learning approach. Further, the system deter- mines distinct events through the clustering of updates by exploiting different types of information such as update time, textual similarity, and the position of the updates within an article. Finally, the system generates the meaningful tem- poral summarization of event-related updates and automat- ically annotates the identified events in a timeline. 0 0
Temporal, cultural and thematic aspects of web credibility Temporal, cultural and thematic aspects of web credibility Radoslaw Nielek
Wawer A.
Jankowski-Lorek M.
Adam Wierzbicki
Lecture Notes in Computer Science English 2013 Is trust to web pages related to nation-level factors? Do trust levels change in time and how? What categories (topics) of pages tend to be evaluated as not trustworthy, and what categories of pages tend to be trustworthy? What could be the reasons of such evaluations? The goal of this paper is to answer these questions using large scale data of trustworthiness of web pages, two sets of websites, Wikipedia and an international survey. 0 0
Term extraction from sparse, ungrammatical domain-specific documents Term extraction from sparse, ungrammatical domain-specific documents Ittoo A.
Gosse Bouma
Expert Systems with Applications English 2013 Existing term extraction systems have predominantly targeted large and well-written document collections, which provide reliable statistical and linguistic evidence to support term extraction. In this article, we address the term extraction challenges posed by sparse, ungrammatical texts with domain-specific contents, such as customer complaint emails and engineers' repair notes. To this aim, we present ExtTerm, a novel term extraction system. Specifically, as our core innovations, we accurately detect rare (low frequency) terms, overcoming the issue of data sparsity. These rare terms may denote critical events, but they are often missed by extant TE systems. ExtTerm also precisely detects multi-word terms of arbitrarily lengths, e.g. with more than 2 words. This is achieved by exploiting fundamental theoretical notions underlying term formation, and by developing a technique to compute the collocation strength between any number of words. Thus, we address the limitation of existing TE systems, which are primarily designed to identify terms with 2 words. Furthermore, we show that open-domain (general) resources, such as Wikipedia, can be exploited to support domain-specific term extraction. Thus, they can be used to compensate for the unavailability of domain-specific knowledge resources. Our experimental evaluations reveal that ExtTerm outperforms a state-of-the-art baseline in extracting terms from a domain-specific, sparse and ungrammatical real-life text collection. © 2012 Elsevier B.V. All rights reserved. 0 0
Thai wikipedia link suggestion framework Thai wikipedia link suggestion framework Rungsawang A.
Siangkhio S.
Surarerk A.
Manaskasemsak B.
Lecture Notes in Electrical Engineering English 2013 The paper presents a framework that exploits the Thai Wikipedia articles as a knowledge source to train the machine learning classifier for link suggestion purpose. Given an input document, important concepts in the text have been automatically extracted, and the chosen corresponding Wikipedia pages have been determined and suggested to be the destination links for additional information. Preliminary experiments from the prototype running on a test set of Thai Wikipedia articles show that this automatic link suggestion framework provides reasonably up to 90 % link suggestion accuracy. 0 0
The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity Is Causing Its Decline The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity Is Causing Its Decline Aaron Halfaker
Geiger R.S.
Morgan J.T.
John Riedl
American Behavioral Scientist English 2013 Open collaboration systems, such as Wikipedia, need to maintain a pool of volunteer contributors to remain relevant. Wikipedia was created through a tremendous number of contributions by millions of contributors. However, recent research has shown that the number of active contributors in Wikipedia has been declining steadily for years and suggests that a sharp decline in the retention of newcomers is the cause. This article presents data that show how several changes the Wikipedia community made to manage quality and consistency in the face of a massive growth in participation have ironically crippled the very growth they were designed to manage. Specifically, the restrictiveness of the encyclopedia's primary quality control mechanism and the algorithmic tools used to reject contributions are implicated as key causes of decreased newcomer retention. Furthermore, the community's formal mechanisms for norm articulation are shown to have calcified against changes-especially changes proposed by newer editors. 0 0
The Tanl lemmatizer enriched with a sequence of cascading filters The Tanl lemmatizer enriched with a sequence of cascading filters Giuseppe Attardi
Dei Rossi S.
Simi M.
Lecture Notes in Computer Science English 2013 We have extended an existing lemmatizer, which relies on a lexicon of about 1.2 millions form, where lemmas are indexed by rich PoS tags, with a sequence of cascading filters, each one in charge of dealing with specific issues related to out-of-dictionary words. The last two filters are devoted to resolve semantic ambiguities between words of the same syntactic category, by querying external resources: an enriched index built on the Italian Wikipedia and the Google index. 0 0
The category structure in Wikipedia: To analyze and know how it grows The category structure in Wikipedia: To analyze and know how it grows Wang Q.
Xiaolong Wang
Zheng Chen
Wang R.
Lecture Notes in Computer Science English 2013 Wikipedia is a famous encyclopedia and is applied to a lot of famous fields for many years, such as natural language processing. The category structure is used and analyzed in this paper. We take the important topological properties into account, such as the connectivity distribution. What's the most important of all is to analyze the growth of the structure from 2004 to 2012 in detail. In order to tell about the growth, the basic properties and the small-worldness is brought in. Some different edge attachment models based on the properties of nodes are tested in order to study how the properties of nodes influence the creation of edges. We are very interested in the phenomenon that the data in 2011 and 2012 is so strange and study the reason closely. Our results offer useful insights for the structure and the growth of the category structure. 0 0
The category structure in wikipedia: To analyze and know its quality using k-core decomposition The category structure in wikipedia: To analyze and know its quality using k-core decomposition Wang Q.
Xiaolong Wang
Zheng Chen
Lecture Notes in Computer Science English 2013 Wikipedia is a famous and free encyclopedia. A network based on its category structure is built and then analyzed from various aspects, such as the connectivity distribution, evolution of the overall topology. As an innovative point of our paper, the model that is on the base of the k-core decomposition is used to analyze evolution of the overall topology and test the quality (that is, the error and attack tolerance) of the structure when nodes are removed. The model based on removal of edges is compared. Our results offer useful insights for the growth and the quality of the category structure, and the methods how to better organize the category structure. 0 0
The decline of wikipedia The decline of wikipedia Simonite T. Technology Review English 2013 Wikipedia, even with its middling quality and poor representation of the world's diversity, could be the best encyclopedia users will get. When Wikipedia launched in 2001, it wasn't intended to be an information source in its own right. In 2003, Wales formed the Wikimedia Foundation to operate the servers and software that run Wikipedia and raise money to support them. But control of the site's content remained with the community dubbed Wikipedians, who over the next few years compiled an encyclopedia larger than any before. Wikipedia inherited and embraced the cultural expectations that an encyclopedia ought to be authoritative, comprehensive, and underpinned by the rational spirit of the Enlightenment. The number of active editors on the English-language Wikipedia peaked in 2007 at more than 51,000 and has been declining ever since. Even though Wikipedia has far fewer active editors than it did in its heyday, the number and length of its articles continue to grow. This means the volunteers who remain have more to do. 0 0
The dispute over filtering "indecent" images in wikipedia The dispute over filtering "indecent" images in wikipedia Roessing T. Masaryk University Journal of Law and Technology English 2013 In 2010, Wikipedia was accused by individuals as well as some media organizations of hosting illegal and indecent images. The foundation that runs Wikipedia commissioned a report on contentious images and the development of an image filter. This opt-in filter was designed to enable individual-level filtering of images with sexual, violent, sacred, or otherwise contentious images. The plans were considered a first step to censorship by many users and sparked considerable protest in Wikipedia's online community. In-depth analysis reveals that concepts from communication research, such as the Third-Person Effect and Public Opinion, can be applied to the issue. Results of an experiment on the effects of disgusting medical images are discussed. 0 0
The emergence of Wikipedia as a new media institution The emergence of Wikipedia as a new media institution Osman K. Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 Wikipedia is an important institution and part of the new media landscape having evolved from the collaborative efforts of millions of distributed users. This poster will present ongoing research that examines how the issues that have been highlighted by conflict within the community have shaped the evolution of Wikipedia from an open wiki experiment to a global knowledge producer. Bringing together the concepts of interpretive flexibility and generative friction with existing theories on the evolution of institutions, the research aims to present possible futures for Wikipedia as part of not only the larger Wikimedia movement, but of an open and accessible web. Categories and Subject Descriptors K.2 [History of Computing]: Theory. K.4.3 [Computers and society]: Organizational Impacts - Computer-supported collaborative work. General Terms Economics, Human Factors, Theory. Copyright 2010 ACM. 0 0
The illiterate editor: Metadata-driven revert detection in wikipedia The illiterate editor: Metadata-driven revert detection in wikipedia Segall J.
Greenstadt R.
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 As the community depends more heavily on Wikipedia as a source of reliable information, the ability to quickly detect and remove detrimental information becomes increasingly important. The longer incorrect or malicious information lingers in a source perceived as reputable, the more likely that information will be accepted as correct and the greater the loss to source reputation. We present The Illiterate Edi- Tor (IllEdit), a content-agnostic, metadata-driven classica- Tion approach to Wikipedia revert detection. Our primary contribution is in building a metadata-based feature set for detecting edit quality, which is then fed into a Support Vec- Tor Machine for edit classication. By analyzing edit histo- ries, the IllEdit system builds a prole of user behavior, es- Timates expertise and spheres of knowledge, and determines whether or not a given edit is likely to be eventually re- verted. The success of the system in revert detection (0.844 F-measure) as well as its disjoint feature set as compared to existing, content-analyzing vandalism detection systems, shows promise in the synergistic usage of IllEdit for increas- ing the reliability of community information. Copyright 2010 ACM. 0 0
The influence of cognitive conflict on the result of collaborative editing in Wikipedia The influence of cognitive conflict on the result of collaborative editing in Wikipedia Jiangnan Q.
Chunling W.
Miao C.
Behaviour and Information Technology English 2013 Different levels of cognitive conflict widely exist in the process of collaborative editing, affecting the result of editing. This can be seen especially in Wikipedia, the free-content encyclopedia edited by users collaboratively. Here, we used the method of exploratory case study to explore the influence of wiki-based cognitive conflict on the result of collaborative editing. Page quality was considered as the result of co-editing. By measuring cognitive conflict and calculating page quality of 'Hong Kong MRT' featured article, we found that different levels of conflict had different influence on page quality and the influence was changing with time variation. Our findings are concluded into four propositions, which highlight the role of cognitive conflict in affecting page quality. This paper provides the foundation for further revising and develops the theory by using the methods of verification research and statistical analysis. 0 0
The influence of source cues and topic familiarity on credibility evaluation The influence of source cues and topic familiarity on credibility evaluation Teun Lucassen
Schraagen J.M.
Computers in Human Behavior English 2013 An important cue in the evaluation of the credibility of online information is the source from which the information comes. Earlier, it has been hypothesized that the source of information is less important when one is familiar with the topic at hand. However, no conclusive results were found to confirm this hypothesis. In this study, we re-examine the relationship between the source of information and topic familiarity. In an experiment with Wikipedia articles with and without the standard Wikipedia layout, we showed that, contrary to our expectations, familiar users have less trust in the information when they know it comes from Wikipedia than when they do not know its source. For unfamiliar users, no differences were found. Moreover, source cues only influenced trust when the credibility of the information itself was ambiguous. These results are interpreted in the 3S-model of information trust (Lucassen & Schraagen, 2011). © 2013 Elsevier Ltd. All rights reserved. 0 0
The past as prologue: Public authority and the encyclopedia of cleveland history The past as prologue: Public authority and the encyclopedia of cleveland history Grabowski J.J. Public Historian English 2013 The Encyclopedia of Cleveland History, first published in 1987 and then placed online in 1998, is recognized as the progenitor of the modern, urban encyclopedia and a model of shared academic and vernacular authority. Today Wikipedia challenges it and similar scholarly products with its reliance on universal authority, its huge scope of entries, and its ability to keep current. This article suggests that vetted urban encyclopedias need to revisit their roots as community enterprises and reimagine and reengage those connections in order to allow their vetted content to remain competitive with Wikipedia and other open-source online resources. © 2013 by The Regents of the University of California and the National Council on Public History. All rights reserved. 0 0
The rise of wikidata The rise of wikidata Vrandecic D. IEEE Intelligent Systems English 2013 Wikipedia was recently enhanced by a knowledge base: Wikidata. Thousands of volunteers who collect facts and their sources help grow and maintain Wikidata. Within only a few months, more than 16 million statements about more than 4 million items have been added to the project, ready to support Wikipedia and to enable and enrich many different types of external applications. 0 0
The role of conflict in determining consensus on quality in wikipedia articles The role of conflict in determining consensus on quality in wikipedia articles Osman K. Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 This paper presents research that investigated the role of conflict in the editorial process of the online encyclopedia, Wikipedia. The study used a grounded approach to analyzing 147 conversations about quality from the archived history of the Wikipedia article Australia. It found that conflict in Wikipedia is a generative friction, regulated by references to policy as part of a coordinated effort within the community to improve the quality of articles. Categories and Subject Descriptors K.4.3 [Computers and society]: Organizational Impacts - Computer-supported collaborative work. General Terms Human Factors, Theory. Copyright 2010 ACM. 0 0
Time evolution of wikipedia network ranking Time evolution of wikipedia network ranking Eom Y.-H.
Frahm K.M.
Benczur A.
Shepelyansky D.L.
European Physical Journal B English 2013 We study the time evolution of ranking and spectral properties of the Google matrix of English Wikipedia hyperlink network during years 2003-2011. The statistical properties of ranking of Wikipedia articles via PageRank and CheiRank probabilities, as well as the matrix spectrum, are shown to be stabilized for 2007-2011. A special emphasis is done on ranking of Wikipedia personalities and universities. We show that PageRank selection is dominated by politicians while 2DRank, which combines PageRank and CheiRank, gives more accent on personalities of arts. The Wikipedia PageRank of universities recovers 80% of top universities of Shanghai ranking during the considered time period. 0 0
Title-based approach to relation discovery from wikipedia Title-based approach to relation discovery from wikipedia Zarrad R.
Doggaz N.
Zagrouba E.
IC3K 2013; KEOD 2013 - 5th International Conference on Knowledge Engineering and Ontology Development, Proceedings English 2013 With the advent of the Web and the explosion of available textual data, the field of domain ontology engineering has gained more and more importance. The last decade, several successful tools for automatically harvesting knowledge from web data have been developed, but the extraction of taxonomic and non taxonomic ontological relationships is still far from being fully solved. This paper describes a new approach which extracts ontological relations from Wikipedia. The non-taxonomic relations extraction process is performed by analyzing the titles which appear in each document of the studied corpus. This method is based on regular expressions which appear in titles and from which we can extract not only the two arguments of the relationships but also the labels which describe the relations. The resulting set of labels is used in order to retrieve new relations by analyzing the title hierarchy in each document. Other relations can be extracted from titles and subtitles containing only one term. An enrichment step is also applied by considering each term which appears as a relation argument of the extracted links in order to discover new concepts and new relations. The experiments have been performed on French Wikipedia articles related to the medical field. The precision and recall values are encouraging and seem to validate our approach. Copyright 0 0
Topic familiarity and information skills in online credibility evaluation Topic familiarity and information skills in online credibility evaluation Teun Lucassen
Muilwijk R.
Noordzij M.L.
Schraagen J.M.
Journal of the American Society for Information Science and Technology English 2013 With the rise of user-generated content, evaluating the credibility of information has become increasingly important. It is already known that various user characteristics influence the way credibility evaluation is performed. Domain experts on the topic at hand primarily focus on semantic features of information (e.g., factual accuracy), whereas novices focus more on surface features (e.g., length of a text). In this study, we further explore two key influences on credibility evaluation: topic familiarity and information skills. Participants with varying expected levels of information skills (i.e., high school students, undergraduates, and postgraduates) evaluated Wikipedia articles of varying quality on familiar and unfamiliar topics while thinking aloud. When familiar with the topic, participants indeed focused primarily on semantic features of the information, whereas participants unfamiliar with the topic paid more attention to surface features. The utilization of surface features increased with information skills. Moreover, participants with better information skills calibrated their trust against the quality of the information, whereas trust of participants with poorer information skills did not. This study confirms the enabling character of domain expertise and information skills in credibility evaluation as predicted by the updated 3S-model of credibility evaluation. 0 0
Transforming Wikipedia into a large scale multilingual concept network Transforming Wikipedia into a large scale multilingual concept network Vivi Nastase
Michael Strube
Artificial Intelligence English 2013 A knowledge base for real-world language processing applications should consist of a large base of facts and reasoning mechanisms that combine them to induce novel and more complex information. This paper describes an approach to deriving such a large scale and multilingual resource by exploiting several facets of the on-line encyclopedia Wikipedia. We show how we can build upon Wikipedia's existing network of categories and articles to automatically discover new relations and their instances. Working on top of this network allows for added information to influence the network and be propagated throughout it using inference mechanisms that connect different pieces of existing knowledge. We then exploit this gained information to discover new relations that refine some of those found in the previous step. The result is a network containing approximately 3.7 million concepts with lexicalizations in numerous languages and 49+ million relation instances. Intrinsic and extrinsic evaluations show that this is a high quality resource and beneficial to various NLP tasks. © 2012 Elsevier B.V. All rights reserved. 0 0
Twitter anticipates bursts of requests for wikipedia articles Twitter anticipates bursts of requests for wikipedia articles Tolomei G.
Orlando S.
Ceccarelli D.
Lucchese C.
International Conference on Information and Knowledge Management, Proceedings English 2013 Most of the tweets that users exchange on Twitter make implicit mentions of named-entities, which in turn can be mapped to corresponding Wikipedia articles using proper Entity Linking (EL) techniques. Some of those become trending entities on Twitter due to a long-lasting or a sudden effect on the volume of tweets where they are mentioned. We argue that the set of trending entities discovered from Twitter may help predict the volume of requests for relating Wikipedia articles. To validate this claim, we apply an EL technique to extract trending entities from a large dataset of public tweets. Then, we analyze the time series derived from the hourly trending score (i.e., an index of popularity) of each entity as measured by Twitter and Wikipedia, respectively. Our results reveals that Twitter actually leads Wikipedia by one or more hours. Copyright 2013 ACM. 0 0
Tìpalo: A tool for automatic typing of DBpedia entities Tìpalo: A tool for automatic typing of DBpedia entities Nuzzolese A.G.
Aldo Gangemi
Valentina Presutti
Draicchio F.
Alberto Musetti
Paolo Ciancarini
Lecture Notes in Computer Science English 2013 In this paper we demonstrate the potentiality of Tìpalo, a tool for automatically typing DBpedia entities. Tìpalo identifies the most appropriate types for an entity in DBpedia by interpreting its definition extracted from its corresponding Wikipedia abstract. Tìpalo relies on FRED, a tool for ontology learning from natural language text, and on a set of graph-pattern-based heuristics which work on the output returned by FRED in order to select the most appropriate types for a DBpedia entity. The tool returns a RDF graph composed of rdf:type, rdfs:subClassOf, owl:sameAs, and owl:equivalentTo statements providing typing information about the entity. Additionally the types are aligned to two lists of top-level concepts, i.e., Wordnet supersenses and a subset of DOLCE Ultra Lite classes. Tìpalo is available as a Web-based tool and exposes its API as HTTP REST services. 0 0
Ukrainian WordNet: Creation and filling Ukrainian WordNet: Creation and filling Anisimov A.
Marchenko O.
Nikonenko A.
Porkhun E.
Taranukha V.
Lecture Notes in Computer Science English 2013 This paper deals with the process of developing a lexical semantic database for Ukrainian language - UkrWordNet. The architecture of the developed system is described in detail. The data storing structure and mechanisms of access to knowledge are reviewed along with the internal logic of the system and some key software modules. The article is also concerned with the research and development of automated techniques of UkrWordNet Semantic Network replenishment and extension. 0 0
Understanding trust formation in digital information sources: The case of Wikipedia Understanding trust formation in digital information sources: The case of Wikipedia Rowley J.
Johnson F.
Journal of Information Science English 2013 This article contributes to knowledge on how users establish the trustworthiness of digital information. An exploratory two-stage study was conducted with Master's and undergraduate students in information studies. In the first phase of the study respondents commented on the factors and processes associated with trust formation. Participants commented on authorship and references, quality of writing and editing, and verification via links to external reference sources. Findings from the second phase, based on a checklist, suggested that participants relied on a range of factors when assessing the trustworthiness of articles, including content factors such as authorship, currency and usefulness together with context factors such as references, expert recommendation and triangulation with their own knowledge. These findings are discussed in the light of previous related research and recommendations for further research are offered. 0 0
Unsupervised gazette creation using information distance Unsupervised gazette creation using information distance Patil S.
Pawar S.
Palshikar G.K.
Bhat S.
Srivastava R.
Lecture Notes in Computer Science English 2013 Named Entity extraction (NEX) problem consists of automatically constructing a gazette containing instances for each NE of interest. NEX is important for domains which lack a corpus with tagged NEs. In this paper, we propose a new unsupervised (bootstrapping) NEX technique, based on a new variant of the Multiword Expression Distance (MED)[1] and information distance [2]. Efficacy of our method is shown using comparison with BASILISK and PMI in agriculture domain. Our method discovered 8 new diseases which are not found in Wikipedia. 0 0
Use of transfer entropy to infer relationships from behavior Use of transfer entropy to infer relationships from behavior Bauer T.L.
Colbaugh R.
Glass K.
Schnizlein D.
ACM International Conference Proceeding Series English 2013 This paper discusses the use of transfer entropy to infer relationships among entities. This is useful when one wants to understand relationships among entities but can only observe their behavior, but not direct interactions with one another. This is the kind of environment prevelant in network monitoring, where one can observe behavior coming into and leaving a network from many different hosts, but cannot directly observe which hosts are related to one another. In this paper, we show that networks of individuals inferred using the transfer entropy of Wikipedia editing behavior predicts observed "ground truth" social networks. At low levels of recall, transfer entropy can extract these social networks with a precision approximately 20 times higher than would be expected by chance. We'll discuss the algorithm, the data set, and various parameter considerations when attempting to apply this algorithm to a data set. Copyright 2012 ACM. 0 0
User-friendly structured queries on Wikipedia: The SWiPE system User-friendly structured queries on Wikipedia: The SWiPE system Maurizio Atzori
Carlo Zaniolo
21st Italian Symposium on Advanced Database Systems, SEBD 2013 English 2013 A novel method is demonstrated that allows semantic and well-structured knowledge bases (such as DBpedia) to be easily queried directly from Wikipedia's pages. Using SWiPE, naive users with no knowledge of the underlying data and schema can easily query DBpedia with powerful questions such as: "Cities in Tuscany with less than 20 thousand people", or "Find Canadian actors which can play the guitar". This is based on the formulation of By-Example Structured (BESt) queries on the infoboxes contained in Wikipedia pages, allowing structured queries, including complex ones involving joins and aggregates, to be run in a simple, user-friendly way. To prove its benefits, we applied this QBE-inspired approach to Wikipedia and DBpedia by developing SWiPE, a system that supports BESt queries by (i) letting the user select an example page and activate its infobox, on which (ii) the user can now click on the relevant fields and enter conditions that (iii) SWiPE translate into a sparql query that is executed on DBpedia using Virtuoso or similar databases. A powerful prototype was built to demonstrate the power and usability of the BESt query approach and its ability to work in combination with the keyword-based search paradigm of current web-engines. 0 0
Using Wikipedia to learn semantic feature representations of concrete concepts in neuroimaging experiments Using Wikipedia to learn semantic feature representations of concrete concepts in neuroimaging experiments Pereira F.
Botvinick M.
Detre G.
Artificial Intelligence English 2013 In this paper we show that a corpus of a few thousand Wikipedia articles about concrete or visualizable concepts can be used to produce a low-dimensional semantic feature representation of those concepts. The purpose of such a representation is to serve as a model of the mental context of a subject during functional magnetic resonance imaging (fMRI) experiments. A recent study by Mitchell et al. (2008) [19] showed that it was possible to predict fMRI data acquired while subjects thought about a concrete concept, given a representation of those concepts in terms of semantic features obtained with human supervision. We use topic models on our corpus to learn semantic features from text in an unsupervised manner, and show that these features can outperform those in Mitchell et al. (2008) [19] in demanding 12-way and 60-way classification tasks. We also show that these features can be used to uncover similarity relations in brain activation for different concepts which parallel those relations in behavioral data from human subjects. © 2012 Elsevier B.V. All rights reserved. 0 0
Using edit sessions to measure participation in wikipedia Using edit sessions to measure participation in wikipedia R. Stuart Geiger
Aaron Halfaker
English 2013 Many quantitative, log-based studies of participation and contribution in CSCW and CMC systems measure the activity of users in terms of output, based on metrics like posts to forums, edits to Wikipedia articles, or commits to code repositories. In this paper, we instead seek to estimate the amount of time users have spent contributing. Through an analysis of Wikipedia log data, we identify a pattern of punctuated bursts in editors' activity that we refer to as edit sessions. Based on these edit sessions, we build a metric that approximates the labor hours of editors in the encyclopedia. Using this metric, we first compare labor-based analyses with output-based analyses, finding that the activity of many editors can appear quite differently based on the kind of metric used. Second, we use edit session data to examine phenomena that cannot be adequately studied with purely output-based metrics, such as the total number of labor hours for the entire project. Copyright 2013 ACM. 0 0
Using wikipedia with associative networks for document classification Using wikipedia with associative networks for document classification Bloom N.
Theune M.
De Jong F.M.G.
ESANN 2013 proceedings, 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning English 2013 We demonstrate a new technique for building associative networks based on Wikipedia, comparing them to WordNet-based associative networks that we used previously, finding the Wikipedia-based networks to perform better at document classification. Additionally, we compare the performance of associative networks to various other text classification techniques using the Reuters-21578 dataset, establishing that associative networks can achieve comparable results. 0 0
Value Production in a Collaborative Environment: Sociophysical Studies of Wikipedia Value Production in a Collaborative Environment: Sociophysical Studies of Wikipedia Taha Yasseri
Kertesz J.
Journal of Statistical Physics English 2013 We review some recent endeavors and add some new results to characterize and understand underlying mechanisms in Wikipedia (WP), the paradigmatic example of collaborative value production. We analyzed the statistics of editorial activity in different languages and observed typical circadian and weekly patterns, which enabled us to estimate the geographical origins of contributions to WPs in languages spoken in several time zones. Using a recently introduced measure we showed that the editorial activities have intrinsic dependencies in the burstiness of events. A comparison of the English and Simple English WPs revealed important aspects of language complexity and showed how peer cooperation solved the task of enhancing readability. One of our focus issues was characterizing the conflicts or edit wars in WPs, which helped us to automatically filter out controversial pages. When studying the temporal evolution of the controversiality of such pages we identified typical patterns and classified conflicts accordingly. Our quantitative analysis provides the basis of modeling conflicts and their resolution in collaborative environments and contribute to the understanding of this issue, which becomes increasingly important with the development of information communication technology. 0 0
Visitpedia: Wiki article visit log visualization for event exploration Visitpedia: Wiki article visit log visualization for event exploration Sun Y.
Tao Y.
Yang G.
Hong Lin
Proceedings - 13th International Conference on Computer-Aided Design and Computer Graphics, CAD/Graphics 2013 English 2013 This paper proposes an interactive visualization tool, Visitpedia, to detect and analyze social events based on Wikipedia visit history. It helps users discover real-world events behind the data and study how these events evolve over time. Different from previous work based on on-line news or similar text corpora, we choose Wikipedia visit counts as our data source since the visit count data better reflect user concerns of social events. We tackle the event-based task from a time-series pattern perspective rather than semantic perspective. Various visualization and user interaction techniques are integrated in Visitpedia. Two case studies are conducted to demonstrate the effectiveness of Visitpedia. 0 0
Visualizing recent changes in Wikipedia Visualizing recent changes in Wikipedia Biuk-Aghai R.P.
Chan R.C.K.
Si Y.-W.
Simon Fong
Science China Information Sciences English 2013 Large wikis such as Wikipedia attract large numbers of editors continuously editing content. It is difficult to observe what editing activity goes on at any given moment, what editing patterns can be observed, and which are the currently active editors and articles. We introduce the design and implementation of an information visualization tool for data streams of recent changes in wikis that aims to address this difficulty. We also show examples of our visualizations from English Wikipedia, and present several patterns of editing activity that we have visually identified using our tool. We have evaluated our tool's usability, accuracy and speed of task performance in comparison with Wikipedia's recent changes page, and have obtained qualitative feedback from users on the pros and cons of our tool. We also present a review of the related literature. 0 0
WHAD: Wikipedia historical attributes data: Historical structured data extraction and vandalism detection from the Wikipedia edit history WHAD: Wikipedia historical attributes data: Historical structured data extraction and vandalism detection from the Wikipedia edit history Enrique Alfonseca
Guillermo Garrido
Delort J.-Y.
Penas A.
Language Resources and Evaluation English 2013 This paper describes the generation of temporally anchored infobox attribute data from the Wikipedia history of revisions. By mining (attribute, value) pairs from the revision history of the English Wikipedia we are able to collect a comprehensive knowledge base that contains data on how attributes change over time. When dealing with the Wikipedia edit history, vandalic and erroneous edits are a concern for data quality. We present a study of vandalism identification in Wikipedia edits that uses only features from the infoboxes, and show that we can obtain, on this dataset, an accuracy comparable to a state-of-the-art vandalism identification method that is based on the whole article. Finally, we discuss different characteristics of the extracted dataset, which we make available for further study. 0 0
WIKIPEDIA: Between lay participation and elite knowledge representation WIKIPEDIA: Between lay participation and elite knowledge representation Konig R. Information Communication and Society English 2013 The decentralized participatory architecture of the Internet challenges traditional knowledge authorities and hierarchies. Questions arise about whether lay inclusion helps to 'democratize' knowledge formation or if existing hierarchies are re-enacted online. This article focuses on Wikipedia, a much-celebrated example which gives an in-depth picture of the process of knowledge production in an open environment. Drawing on insights from the sociology of knowledge, Wikipedia's talk pages are conceptualized as an arena where reality is socially constructed. Using grounded theory, this article examines the entry for the September 11 attacks and its related talk pages in the German Wikipedia. Numerous alternative interpretations (labeled as 'conspiracy theories') that fundamentally contradict the account of established knowledge authorities regarding this event have emerged. On the talk pages, these views collide, thereby serving as a useful case study to examine the role of experts and lay participants in the process of knowledge construction on Wikipedia. The study asks how the parties negotiate 'what actually happened' and which knowledges should be represented in the Wikipedia entry. The conflicting points of view overload the discursive capacity of the contributors. The community reacts by marginalizing opposing knowledge and protecting or immunizing the article against these disparate views. This is achieved by rigorously excluding knowledge which is not verified by external expert authorities. Therefore, in this case, lay participation did not lead to a 'democratization' of knowledge production, but rather re-enacted established hierarchies. 0 0
What leads students to adopt information from Wikipedia? An empirical investigation into the role of trust and information usefulness What leads students to adopt information from Wikipedia? An empirical investigation into the role of trust and information usefulness Shen X.-L.
Cheung C.M.K.
Lee M.K.O.
British Journal of Educational Technology English 2013 With the prevalence of the Internet, it has become increasingly easy and common for students to seek information from various online sources. Wikipedia is one of the largest and most popular reference websites that university students may heavily rely on in completing their assignments and other course-related projects. Based on the information adoption model, this study empirically examines the effects of trust and information usefulness on Hong Kong students' information adoption from Wikipedia. We conducted an online survey and analysed the responses using partial least squares. Overall, the model explained 69.4% of the variance in information adoption, 59.1% of the variance in trust and 62.7% of the variance in information usefulness. Interestingly, deviating significantly from the information adoption model, trust played a major role in determining information adoption and fully mediated the relationship between information usefulness and information adoption. The implications of this study will provide important insights to both researchers and practitioners. © 2012 The Authors. British Journal of Educational Technology 0 0
When the levee breaks: Without bots, what happens to wikipedia's quality control processes? When the levee breaks: Without bots, what happens to wikipedia's quality control processes? Geiger R.S.
Aaron Halfaker
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 In the first half of 2011, ClueBot NG - one of the most prolific counter-vandalism bots in the English-language Wikipedia - went down for four distinct periods, each period of downtime lasting from days to weeks. In this paper, we use these periods of breakdown as naturalistic experiments to study Wikipedia's heterogeneous quality control network, which we analyze as a multi-tiered system in which distinct classes of reviewers use various reviewing technologies to patrol for different kinds of damage at staggered time periods. Our analysis showed that the overall time-to-revert edits was almost doubled when this software agent was down. Yet while a significantly fewer proportion of edits made during the bot's downtime were reverted, we found that those edits were later eventually reverted. This suggests that other agents in Wikipedia took over this quality control work, but performed it at a far slower rate. Categories and Subject Descriptors H.5.3 [Information Systems]: Group and Organization Interfaces-computer-supported collaborative work. Copyright 2010 ACM. 0 0
Where shall we go today? Planning touristic tours with TripBuilder Where shall we go today? Planning touristic tours with TripBuilder Brilhante I.
Macedo J.A.
Nardini F.M.
Perego R.
Renso C.
International Conference on Information and Knowledge Management, Proceedings English 2013 In this paper we propose TripBuilder, a new framework for personalized touristic tour planning. We mine from Flickr the information about the actual itineraries followed by a multitude of different tourists, and we match these itineraries on the touristic Point of Interests available from Wikipedia. The task of planning personalized touristic tours is then modeled as an instance of the Generalized Maximum Coverage problem. Wisdom-of-the-crowds information allows us to derive touristic plans that maximize a measure of interest for the tourist given her preferences and visiting timebudget. Experimental results on three different touristic cities show that our approach is effective and outperforms strong baselines. Copyright 2013 ACM. 0 0
Wicked wikipedia? communities of practice, the production of knowledge and australian sport history Wicked wikipedia? communities of practice, the production of knowledge and australian sport history Townsend S.
Osmond G.
Phillips M.G.
International Journal of the History of Sport English 2013 Academic responses to Wikipedia since its inception in 2001 have shifted from scepticism and hostility to serious critique. Wikipedia is a project driven by a community of amateur, and sometimes professional, scholars and it is this community-and associated rules and practices-that shapes the site's publicly viewable content. Despite the centrality of egalitarianism and communal wisdom to the Wikipedian ethos, the encyclopaedia is not filled equitably with historical knowledge or topics. This article addresses the role of Wikipedia in the production of knowledge in a sport history context. An analysis of 115 Wikipedia articles written about notable Australian sportspeople revealed a disproportionately large group of high-quality cricket biographies. Further investigation revealed that a small group of Wikipedians were responsible for writing these articles. The work of this community of practice is indicative of the influence that dedicated special interest groups can have over the production of knowledge on Wikipedia and raises broader questions about the production of knowledge in sport history. 0 0
Wiki3C: Exploiting wikipedia for context-aware concept categorization Wiki3C: Exploiting wikipedia for context-aware concept categorization Jiang P.
Hou H.
Long Chen
Shun-ling Chen
Conglei Yao
Chenliang Li
Wang M.
WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining English 2013 Wikipedia is an important human generated knowledge base containing over 21 million articles organized by millions of categories. In this paper, we exploit Wikipedia for a new task of text mining: Context-aware Concept Categorization. In the task, we focus on categorizing concepts according to their context. We exploit article link feature and category structure in Wikipedia, followed by introducing Wiki3C, an unsupervised and domain independent concept categorization approach based on context. In the approach, we investigate two strategies to select and filter Wikipedia articles for the category representation. Besides, a probabilistic model is employed to compute the semantic relatedness between two concepts in Wikipedia. Experimental evaluation using manually labeled ground truth shows that our proposed Wiki3C can achieve a noticeable improvement over the baselines without considering contextual information. 0 0
WikiBilim (Wiki Knowledge): Salvation of the Kazakh language on the internet WikiBilim (Wiki Knowledge): Salvation of the Kazakh language on the internet Sapargaliyev D. 2013 International Conference on Interactive Collaborative Learning, ICL 2013 English 2013 The Kazakh language was under the unofficial ban in the Soviet period. In recent years, Kazakh-language segment of the Internet is developing rapidly. The development of the Kazakh language has become the first priority of the government. The most successful and large-scale project on revival of the Kazakh language was the creation of WikiBilim fund. The main objective of the fund is to increase the articles in the Kazakh language on Wikipedia. Also, the fund is actively assisting in the creation of digital library, online translator and dictionary. Will WikiBilim be a salvation of the Kazakh language on the Internet? 0 0
WikiDetect: Automatic vandalism detection for Wikipedia using linguistic features WikiDetect: Automatic vandalism detection for Wikipedia using linguistic features Cioiu D.
Rebedea T.
Lecture Notes in Computer Science English 2013 Vandalism of the content has always been one of the greatest problems for Wikipedia, yet only few completely automatic solutions for solving it have been developed so far. Volunteers still spend large amounts of time correcting vandalized page edits, instead of using this time to improve the quality of the content of articles. The purpose of this paper is to introduce a new vandalism detection system, that only uses natural language processing and machine learning techniques. The system has been evaluated on a corpus of real vandalized data in order to test its performance and justify the design choices. The same expert annotated wikitext, extracted from the encyclopedia's database, is used to evaluate different vandalism detection algorithms. The paper presents a critical analysis of the obtained results, comparing them to existing solutions, and suggests different statistical classification methods that bring several improvements to the task at hand. 0 0
Wikipedia and encyclopedic production Wikipedia and encyclopedic production Loveland J.
Reagle J.
New Media and Society English 2013 Wikipedia is often presented within a foreshortened or idealized history of encyclopedia-making. Here we challenge this viewpoint by contextualizing Wikipedia and its modes of production on a broad temporal scale. Drawing on examples from Roman antiquity onward, but focusing on the years since 1700, we identify three forms of encyclopedic production: compulsive collection, stigmergic accumulation, and corporate production. While each could be characterized as a discrete period, we point out the existence of significant overlaps in time as well as with the production of Wikipedia today. Our analysis explores the relation of editors, their collaborators, and their modes of composition with respect to changing notions of authorship and originality. Ultimately, we hope our contribution will help scholars avoid ahistorical claims about Wikipedia, identify historical cases germane to the social scientist's concerns, and show that contemporary questions about Wikipedia have a lifespan exceeding the past decade. 0 0
Wikipedia articles representation with matrix'u Wikipedia articles representation with matrix'u Szymanski J. Lecture Notes in Computer Science English 2013 In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix'u application used for creating computational datasets of Wikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories. 0 0
Wikipedia as a tool for active learning. Experience gained within the framework of the wikifabricación project Wikipedia as a tool for active learning. Experience gained within the framework of the wikifabricación project Abellan-Nebot J.V.
Bruscas G.M.
Serrano J.
Romero F.
Materials Science Forum English 2013 Wikipedia, the Free Encyclopedia, is one of the most visited websites on the Internet and it is a tool which students often use in their assignments, although they do not usually understand the basics underlying it. To overcome this limitation and promote the active learning approach in our courses, last year an educational innovation project was carried out that was aimed mainly at improving students' skills in technical writing as well as their ability to review the technical contents of the Wikipedias. Additionally, it sought to explore new opportunities that these tools can offer both teachers and students. This paper describes the experiment carried out in a second-year undergraduate engineering course, the results of which show that introducing activities such as edition and revision within Wikipedia is an interesting way to enhance transversal competencies as well as others related to the main contents of the course. 0 0
Wikipedia as an SMT training corpus Wikipedia as an SMT training corpus Tufis D.
Ion R.
Dumitrescu S.D.
Stefanescu D.
International Conference Recent Advances in Natural Language Processing, RANLP English 2013 This article reports on mass experiments supporting the idea that data extracted from strongly comparable corpora may successfully be used to build statistical machine translation systems of reasonable translation quality for in-domain new texts. The experiments were performed for three language pairs: Spanish-English, German-English and Romanian-English, based on large bilingual corpora of similar sentence pairs extracted from the entire dumps of Wikipedia as of June 2012. Our experiments and comparison with similar work show that adding indiscriminately more data to a training corpus is not necessarily a good thing in SMT. 0 0
Wikipedia based semantic smoothing for twitter sentiment classification Wikipedia based semantic smoothing for twitter sentiment classification Torunoglu D.
Telseren G.
Sagturk O.
Ganiz M.C.
2013 IEEE International Symposium on Innovations in Intelligent Systems and Applications, IEEE INISTA 2013 English 2013 Sentiment classification is one of the important and popular application areas for text classification in which texts are labeled as positive and negative. Moreover, Naïve Bayes (NB) is one of the mostly used algorithms in this area. NB having several advantages on lower complexity and simpler training procedure, it suffers from sparsity. Smoothing can be a solution for this problem, mostly Laplace Smoothing is used; however in this paper we propose Wikipedia based semantic smoothing approach. In our study we extend semantic approach by using Wikipedia article titles that exist in training documents, categories and redirects of these articles as topic signatures. Results of the extensive experiments show that our approach improves the performance of NB and even can exceed the accuracy of SVM on Twitter Sentiment 140 dataset. 0 0
Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning Bing L.
Lam W.
Wong T.-L.
WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining English 2013 We develop a new framework to achieve the goal of Wikipedia entity expansion and attribute extraction from the Web. Our framework takes a few existing entities that are automatically collected from a particular Wikipedia category as seed input and explores their attribute infoboxes to obtain clues for the discovery of more entities for this category and the attribute content of the newly discovered entities. One characteristic of our framework is to conduct discovery and extraction from desirable semi-structured data record sets which are automatically collected from the Web. A semi-supervised learning model with Conditional Random Fields is developed to deal with the issues of extraction learning and limited number of labeled examples derived from the seed entities. We make use of a proximate record graph to guide the semi-supervised learning process. The graph captures alignment similarity among data records. Then the semi-supervised learning process can leverage the unlabeled data in the record set by controlling the label regularization under the guidance of the proximate record graph. Extensive experiments on different domains have been conducted to demonstrate its superiority for discovering new entities and extracting attribute content. 0 0
Wikipedia-based WSD for multilingual frame annotation Wikipedia-based WSD for multilingual frame annotation Tonelli S.
Claudio Giuliano
Kateryna Tymoshenko
Artificial Intelligence English 2013 Many applications in the context of natural language processing have been proven to achieve a significant performance when exploiting semantic information extracted from high-quality annotated resources. However, the practical use of such resources is often biased by their limited coverage. Furthermore, they are generally available only for English and few other languages. We propose a novel methodology that, starting from the mapping between FrameNet lexical units and Wikipedia pages, automatically leverages from Wikipedia new lexical units and example sentences. The goal is to build a reference data set for the semi-automatic development of new FrameNets. In addition, this methodology can be adapted to perform frame identification in any language available in Wikipedia. Our approach relies on a state-of-the-art word sense disambiguation system that is first trained on English Wikipedia to assign a page to the lexical units in a frame. Then, this mapping is further exploited to perform frame identification in English or in any other language available in Wikipedia. Our approach shows a high potential in multilingual settings, because it can be applied to languages for which other lexical resources such as WordNet or thesauri are not available. © 2012 Elsevier B.V. All rights reserved. 0 0
Wikipedia-based semantic query enrichment Wikipedia-based semantic query enrichment Al Masri M.
Berrut C.
Chevallet J.-P.
International Conference on Information and Knowledge Management, Proceedings English 2013 We deal, in this paper, with the short queries (containing one or two words) problem. Short queries have no sufficient information to express their semantics in a non ambiguous way. Pseudo-relevance feedback (PRF) approach for query expansion is useful in many Information Retrieval (IR) tasks. However, this approach does not work well in the case of very short queries. Therefore, we present instead of PRF a semantic query enrichment method based on Wikipedia. This method expands short queries by semantically related terms extracted from Wikipedia. Our experiments on cultural heritage corpora show significant improvement in the retrieval performance. Copyright 0 0
Wikipedians from mars: Female students' perceptions toward wikipedia Wikipedians from mars: Female students' perceptions toward wikipedia Jihie Kim Proceedings of the ASIST Annual Meeting English 2013 This paper presents the results of a preliminary study on non-contributing behaviors among college-aged Wikipedia users. By focusing on female freshmen, I investigated how college-aged female students start establishing certain attitudes toward Wikipedia, how previous experience influences these perceptions, and how a new social environment has an impact on their perceptions. Based on in-depth interviews with 9 female college freshmen and 4 non-freshmen students, the results suggest that experience, subjective norms, and their social position as a college student influenced female freshmen to establish negative perceptions and constrain them from contributing to Wikipedia. The present study is significant in that it provides multifaceted aspects of perceptions that discourage contributions from young female students to Wikipedia, and design implications for promoting their participation. 0 0
Wisdom in the social crowd: An analysis of Quora Wisdom in the social crowd: An analysis of Quora Gang Wang
Gill K.
Mohanlal M.
Hua Zheng
Zhao B.Y.
WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web English 2013 Efforts such as Wikipedia have shown the ability of user communities to collect, organize and curate information on the Internet. Recently, a number of question and answer (Q&A) sites have successfully built large growing knowledge repositories, each driven by a wide range of questions and answers from its users community. While sites like Yahoo Answers have stalled and begun to shrink, one site still going strong is Quora, a rapidly growing service that augments a regular Q&A system with social links between users. Despite its success, however, little is known about what drives Quora's growth, and how it continues to connect visitors and experts to the right questions as it grows. In this paper, we present results of a detailed analysis of Quora using measurements. We shed light on the impact of three different connection networks (or graphs) inside Quora, a graph connecting topics to users, a social graph connecting users, and a graph connecting related questions. Our results show that heterogeneity in the user and question graphs are significant contributors to the quality of Quora's knowledge base. One drives the attention and activity of users, and the other directs them to a small set of popular and interesting questions. Copyright is held by the International World Wide Web Conference Committee (IW3C2). 0 0
Work-to-rule: The emergence of algorithmic governance in wikipedia Work-to-rule: The emergence of algorithmic governance in wikipedia Claudia Muller-Birn
Dobusch L.
Herbsleb J.D.
ACM International Conference Proceeding Series English 2013 Research has shown the importance of a functioning governance system for the success of peer production communities. It particularly highlights the role of human coordination and communication within the governance regime. In this article, we extend this line of research by differentiating two categories of governance mechanisms. The first category is based primarily on communication, in which social norms emerge that are often formalized by written rules and guidelines. The second category refers to the technical infrastructure that enables users to access artifacts, and that allows the community to communicate and coordinate their collective actions to create those artifacts. We collected qualitative and quantitative data from Wikipedia in order to show how a community's consensus gradually converts social mechanisms into algorithmic mechanisms. In detail, we analyze algorithmic governance mechanisms in two embedded cases: The software extension "flagged revisions" and the bot "xqbot". Our insights point towards a growing relevance of algorithmic governance in the realm of governing large-scale peer production communities. This extends previous research, in which algorithmic governance is almost absent. Further research is needed to unfold, understand, and also modify existing interdependencies between social and algorithmic governance mechanisms. 0 0
Writing for Wikipedia: Co-constructing knowledge and writing for a public audience Writing for Wikipedia: Co-constructing knowledge and writing for a public audience Britt L.L. The Plugged-In Professor: Tips and Techniques for Teaching with Social Media English 2013 This assignment allows students to research topics in depth and become skilled at communicating academic knowledge for a public audience. The assignment draws attention to the collaborative construction of knowledge and the forces that shape what counts as knowledge and what gets disseminated. It also encourages students to consider how to organize information to be useful and illuminating to others, and how to consider connections between topics and concepts. The assignment engages students in critique, as they are more willing to critique and revise their writing when that writing will be accessible to the public. The assignment also exposes students to a social media information-sharing medium, Wikipedia, and encourages their critical consideration of the strengths and limitations of this online encyclopedic resource. © 2013 Woodhead Publishing Limited. All rights reserved. 0 0
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia Johannes Hoffart
Suchanek F.M.
Berberich K.
Gerhard Weikum
Artificial Intelligence English 2013 We present YAGO2, an extension of the YAGO knowledge base, in which entities, facts, and events are anchored in both time and space. YAGO2 is built automatically from Wikipedia, GeoNames, and WordNet. It contains 447 million facts about 9.8 million entities. Human evaluation confirmed an accuracy of 95% of the facts in YAGO2. In this paper, we present the extraction methodology, the integration of the spatio-temporal dimension, and our knowledge representation SPOTL, an extension of the original SPO-triple model to time and space. © 2012 Elsevier B.V. All rights reserved. 0 0
Estado del arte de la investigación sobre wikis Estado del arte de la investigación sobre wikis Emilio J. Rodríguez-Posada
Juan Manuel Dodero-Beardo
University of Cádiz Spanish December 2012 El interés de los investigadores por los wikis, en especial Wikipedia, ha ido en aumento en los últimos años. La primera edición de [[WikiSym]], un simposio sobre wikis, se celebró en 2005 y desde entonces han aparecido multitud de congresos, ''workshops'', conferencias y competiciones en este área. El estudio de los wikis es un campo emergente y prolífico. Ha habido varios intentos, aunque con escaso éxito, de recopilar toda la literatura sobre wikis. Unas veces el enfoque o la herramienta utilizada eran limitados, otras debido a las dimensiones de la tarea el proyecto era abandonado y al poco tiempo los metadatos bibliográficos se perdían. En este trabajo presentamos [[WikiPapers]], un proyecto colaborativo para recopilar toda la literatura sobre wikis. Se hace uso de MediaWiki y su extensión semántica, ambos conocidos por los investigadores de este campo. Hasta noviembre de 2012 se han recopilado más de 1.700 publicaciones y sus metadatos, además de documentación sobre herramientas y ''datasets'' relacionados. Los metadatos son exportables en los formatos BibTeX, RDF, CSV y JSON. Los historiales completos del wiki están disponibles para descargar y facilitar su preservación. El proyecto está abierto a la participación de todo el mundo. El resto del trabajo se divide de la siguiente manera. En la sección 2 motivamos este trabajo haciendo un repaso a los distintos enfoques utilizados hasta ahora para recopilar toda la literatura sobre wikis, incidiendo en sus ventajas e inconvenientes. En la sección 3 detallamos los objetivos. En la sección 4 definimos algunos términos que servirán para comprender mejor el contenido. En la sección 5 presentamos WikiPapers, cómo funciona y qué pasos se han dado. En la sección 6 hacemos un estado del arte empleando WikiPapers. En la sección 7 repasamos las cuestiones que a día de hoy siguen abiertas o que han tenido poca atención hasta ahora. Finalmente, en la sección 8, terminamos con unas conclusiones y trabajo futuro. 28 0
Mass Collaboration or Mass Amateurism? A comparative study on the quality of scientific information produced using Wiki tools and concepts Mass Collaboration or Mass Amateurism? A comparative study on the quality of scientific information produced using Wiki tools and concepts Fernando Rodrigues Universidade Évora Portuguese December 2012 With this PhD dissertation, we intend to contribute to a better understanding of the Wiki phenomenon as a knowledge management system which aggregates private knowledge. We also wish to check to what extent information generated through anonymous and freely bestowed mass collaboration is reliable as opposed to the traditional approach. In order to achieve that goal, we develop a comparative study between Wikipedia and Encyclopaedia Britannica with regard to accuracy, depth and detail of information in both, in order to confront the quality of the knowledge repository produced by them. That will allow us to reach a conclusion about the efficacy of the business models behind them. We will use a representative random sample which is composed by the articles that are comprised in both encyclopedias. Each pair of articles was previously reformatted and then graded by an expert in its subject area. At the same time, we collected a small convenience sample which only integrates Management articles. Each pair of articles was graded by several experts in order to determine the uncertainty associated with having diverse gradings of the same article and apply it to the evaluations carried out by just one expert. The conclusion was that the average quality of the Wikipedia articles which were analysed was superior to its peers’ and that this difference was statistically significant. An inquiry was conducted within the academia which certified that traditional information sources were used by a minority as the first approach to seeking information. This inquiry also made clear that reliance on these sources was considerably larger than reliance on information obtained through Wikipedia. This quality perception, as well as the diametrically opposed results of its evaluation through a blind test, reinforces the evaluating panel’s exemption. However much the chosen sample is representative of the universe to be studied, results have depended on the evaluators’ personal opinion and chosen criteria. This means that the reproducibility of this study’s conclusions using a different grading panel cannot be guaranteed. Nevertheless, this is not enough of a reason to reject the study results obtained through more than five hundred evaluations. This thesis is thus an attempt to help clarifying this topic and contributing to a better perception of the quality of a tool which is daily used by millions of people, of the mass collaboration which feeds it and of the collaborative software that supports it. 0 0
Wikipédia, espace fluide, espace à parcourir Wikipédia, espace fluide, espace à parcourir Rémi Mathis La Revue de la BNU French September 2012 Wikipédia est un espace foncièrement décentré : qui existe en plus de 280 langues, où les auteurs se comptent en centaines de milliers, qui évolue sans cesse pour coller au dernier état du savoir. Afin de faciliter la navigation, des portes d'entrée sont créées et des outils permettent de structurer cet espace. L'idée n'est toutefois pas d'imposer un parcours mais bien au contraire de favoriser la fluidité de la lecture, par des itinéraires sans cesse réinventés par les lecteurs - tendant à enrichir son expérience de découverte et l'amener vers des articles qu'ils n'aurait pas cherché par lui-même. 2 0
Assessing the accuracy and quality of Wikipedia entries compared to popular online encyclopaedias Assessing the accuracy and quality of Wikipedia entries compared to popular online encyclopaedias Imogen Casebourne
Chris Davies
Michelle Fernandes
Naomi Norman
English 2 August 2012 8 0
Citation needed: The dynamics of referencing in Wikipedia Citation needed: The dynamics of referencing in Wikipedia Chih-Chun Chen
Camille Roth
WikiSym English August 2012 The extent to which a Wikipedia article refers to external sources to substantiate its content can be seen as a measure of its externally invoked authority. We introduce a protocol for characterising the referencing process in the context of general article editing. With a sample of relatively mature articles, we show that referencing does not occur regularly through an article’s lifetime but is associated with periods of more substantial editing, when the article has reached a certain level of maturity (in terms of the number of times it has been revised and its length). References also tend to be contributed by editors who have contributed more frequently and more substantially to an article, suggesting that a subset of more qualified or committed editors may exist for each article. 13 1
Drawing a Data-Driven Portrait of Wikipedia Editors Drawing a Data-Driven Portrait of Wikipedia Editors Robert West
Ingmar Weber
Carlos Castillo
WikiSym English August 2012 While there has been a substantial amount of research into the editorial and organizational processes within Wikipedia, little is known about how Wikipedia editors (Wikipedians) relate to the online world in general. We attempt to shed light on this issue by using aggregated log data from Yahoo!’s browser toolbar in order to analyze Wikipedians’ editing behavior in the context of their online lives beyond Wikipedia. We broadly characterize editors by investigating how their online behavior differs from that of other users; e.g., we find that Wikipedia editors search more, read more news, play more games, and, perhaps surprisingly, are more immersed in popular culture. Then we inspect how editors’ general interests relate to the articles to which they contribute; e.g., we confirm the intuition that editors are more familiar with their active domains than average users. Finally, we analyze the data from a temporal perspective; e.g., we demonstrate that a user’s interest in the edited topic peaks immediately before the edit. Our results are relevant as they illuminate novel aspects of what has become many Web users’ prevalent source of information. 0 0
Etiquette in Wikipedia: Weening New Editors into Productive Ones Etiquette in Wikipedia: Weening New Editors into Productive Ones Ryan Faulkner
Steven Walling
Maryana Pinchuk
WikiSym English August 2012 Currently, the greatest challenge faced by the Wikipedia community involves reversing the decline of active editors on the site – in other words, ensuring that the encyclopedia’s contributors remain sufficiently numerous to fill the roles that keep it relevant. Due to the natural drop-off of old contributors, newcomers must constantly be socialized, trained and retained. However recent research has shown the Wikipedia community is failing to retain a large proportion of productive new contributors and implicates Wikipedia’s semi-automated quality control mechanisms and their interactions with these newcomers as an exacerbating factor. This paper evaluates the effectiveness of minor changes to the normative warning messages sent to newcomers from one of the most prolific of these quality control tools (Huggle) in preserving their rate of contribution. The experimental results suggest that substantial gains in newcomer participation can be attained through inexpensive changes to the wording of the first normative message that new contributors receive. 0 1
Identifying controversial articles in Wikipedia: A comparative study Identifying controversial articles in Wikipedia: A comparative study Hoda Sepehri Rad
Denilson Barbosa
WikiSym English August 2012 Wikipedia articles are the result of the collaborative editing of a diverse group of anonymous volunteer editors, who are passionate and knowledgeable about specific topics. One can argue that this plurality of perspectives leads to broader coverage of the topic, thus benefitting the reader. On the other hand, differences among editors on polarizing topics can lead to controversial or questionable content, where facts and arguments are presented and discussed to support a particular point of view. Controversial articles are manually tagged by Wikipedia editors, and span many interesting and popular topics, such as religion, history, and politics, to name a few. Recent works have been proposed on automatically identifying controversy within unmarked articles. However, to date, no systematic comparison of these efforts has been made. This is in part because the various methods are evaluated using different criteria and on different sets of articles by different authors, making it hard for anyone to verify the efficacy and compare all alternatives. We provide a first attempt at bridging this gap. We compare five different methods for modelling and identifying controversy, and discuss some of the unique difficulties and opportunities inherent to the way Wikipedia is produced. 0 0
In Search of the Ur-Wikipedia: Universality, Similarity, and Translation in the Wikipedia Inter-Language Link Network In Search of the Ur-Wikipedia: Universality, Similarity, and Translation in the Wikipedia Inter-Language Link Network Morten Warncke-Wang
Anuradha Uduwage
Zhenhua Dong
John Riedl
WikiSym English August 2012 Wikipedia has become one of the primary encyclopaedic information repositories on the World Wide Web. It started in 2001 with a single edition in the English language and has since expanded to more than 20 million articles in 283 languages. Criss-crossing between the Wikipedias is an interlanguage link network, connecting the articles of one edition of Wikipedia to another. We describe characteristics of articles covered by nearly all Wikipedias and those covered by only a single language edition, we use the network to understand how we can judge the similarity between Wikipedias based on concept coverage, and we investigate the flow of translation between a selection of the larger Wikipedias. Our findings indicate that the relationships between Wikipedia editions follow Tobler's first law of geography: similarity decreases with increasing distance. The number of articles in a Wikipedia edition is found to be the strongest predictor of similarity, while language similarity also appears to have an influence. The English Wikipedia edition is by far the primary source of translations. We discuss the impact of these results for Wikipedia as well as user-generated content communities in general. 0 0
Manypedia: Comparing Language Points of View of Wikipedia Communities Manypedia: Comparing Language Points of View of Wikipedia Communities Paolo Massa
Federico Scrinzi
WikiSym English August 2012 The 4 million articles of the English Wikipedia have been written in a collaborative fashion by more than 16 million volunteer editors. On each article, the community of editors strive to reach a neutral point of view, representing all significant views fairly, proportionately, and without biases. However, beside the English one, there are more than 280 editions of Wikipedia in different languages and their relatively isolated communities of editors are not forced by the platform to discuss and negotiate their points of view. So the empirical question is: do communities on different language Wikipedias develop their own diverse Linguistic Points of View (LPOV)? To answer this question we created and released as open source [[Manypedia]], a web tool whose aim is to facilitate cross-cultural analysis of Wikipedia language communities by providing an easy way to compare automatically translated versions of their different representations of the same topic. 0 0
Mutual Evaluation of Editors and Texts for Assessing Quality of Wikipedia Articles Mutual Evaluation of Editors and Texts for Assessing Quality of Wikipedia Articles Yu Suzuki
Masatoshi Yoshikawa
WikiSym English August 2012 In this paper, we propose a method to identify good quality Wikipedia articles by mutually evaluating editors and texts. A major approach for assessing article quality is a text survival ratio based approach. In this approach, when a text survives beyond multiple edits, the text is assessed as good quality. This approach assumes that poor quality texts are deleted by editors with high possibility. However, many vandals delete good quality texts frequently, then the survival ratios of good quality texts are improperly decreased by vandals. As a result, many good quality texts are unfairly assessed as poor quality. In our method, we consider editor quality for calculating text quality, and decrease the impacts on text qualities by the vandals who has low quality. Using this improvement, the accuracy of the text quality should be improved. However, an inherent problem of this idea is that the editor qualities are calculated by the text qualities. To solve this problem, we mutually calculate the editor and text qualities until they converge. We did our experimental evaluation, and we confirmed that the proposed method could accurately assess the text qualities. 0 0
Staying in the Loop: Structure and Dynamics of Wikipedia's Breaking News Collaborations Staying in the Loop: Structure and Dynamics of Wikipedia's Breaking News Collaborations Brian Keegan
Darren Gergle
Noshir Contractor
WikiSym English August 2012 Despite the fact that Wikipedia articles about current events are more popular and attract more contributions than typical articles, canonical studies of Wikipedia have only analyzed articles about pre-existing information. We expect the co-authoring of articles about breaking news incidents to exhibit high-tempo coordination dynamics which are not found in articles about historical events and information. Using 1.03 million revisions made by 158,384 users to 3,233 English Wikipedia articles about disasters, catastrophes, and conflicts since 1990, we construct “article trajectories” of editor interactions as they coauthor an article. Examining a subset of this corpus, our analysis demonstrates that articles about current events exhibit structures and dynamics distinct from those observed among articles about non-breaking events. These findings have implications for how collective intelligence systems can be leveraged to process and make sense of complex information. 0 0
Wikipédia. Une somme originale de copies Wikipédia. Une somme originale de copies Rémi Mathis Médium French August 2012 Comment Wikipédia peut être le reflet du savoir d'une époque en rejetant la copie. La question de la copie vis-à-vis de Wikipédia est abordée à trois niveaux : 1/Wikipédia est une synthèse de la connaissance mais sa licence l'oblige à être foncièrement originale 2/Wikipédia comme copie des encyclopédies ou nouveau modèle 3/Wikipédia, source de textes prêts à être recopiés 8 0
Writing up rather than writing down: Becoming Wikipedia Literate Writing up rather than writing down: Becoming Wikipedia Literate Heather Ford
R. Stuart Geiger
WikiSym English August 2012 Editing Wikipedia is certainly not as simple as learning the MediaWiki syntax and knowing where the “edit” bar is, but how do we conceptualize the cultural and organizational understandings that make an effective contributor? We draw on work of literacy practitioner and theorist Richard Darville to advocate a multi-faceted theory of literacy that sheds light on what new knowledges and organizational forms are required to improve participation in Wikipedia’s communities. We outline what Darville refers to as the “background knowledges” required to be an empowered, literate member and apply this to the Wikipedia community. Using a series of examples drawn from interviews with new editors and qualitative studies of controversies in Wikipedia, we identify and outline several different literacy asymmetries. 0 1
Wikipédia, un projet hors normes ? Wikipédia, un projet hors normes ? Rémi Bachelet
Alexandre Moatti
Responsabilité & Environnement (Annales des Mines) French 24 July 2012 Wikipédia et l'ISO représentent toutes deux une cristallisation du savoir. que ce soit savoir-faire (ISO) ou savoir encyclopédique (Wikipédia). Toutes deux sont fondés sur la recherche de consensus et la collaboration sous forme de textes écrits. Dès le départ Wikipédia a adopté des règles, avec ses cinq principes fondateurs. La montée en puissance a conduit au développement d'un espace méta (ex. page de discussion) dont le fonctionnement a nécessité une codification. 2 0
Wikipedia de la A a la W Wikipedia de la A a la W Tomás Saorín-Pérez Editorial UOC Spanish July 2012 Wikipedia es una realidad que funciona, aunque en teoría pueda parecer un sueño irrealizable. Un puñado de entusiastas ha redefinido desde la nada el concepto clásico de enciclopedia y ha construido la fuente de referencia más usada de la historia. ¿Tiene suficiente calidad? La respuesta es afirmativa, y para justificarlo hay que profundizar en los mecanismos de los que está dotada, que le permiten alcanzar el nivel de calidad que se desee, combinando el esfuerzo de miles de editores voluntarios autoorganizados. Wikipedia es al mismo tiempo contenido y personas. Es el momento de conocerla por dentro y de potenciar su apuesta por el conocimiento abierto desde las instituciones culturales, científicas y educativas. Participar en Wikipedia permite aprender de este increíble laboratorio global de construcción social de información organizada. 0 0
Who Deletes Wikipedia? Who Deletes Wikipedia? English 6 June 2012 0 0
Reverts Revisited: Accurate Revert Detection in Wikipedia Reverts Revisited: Accurate Revert Detection in Wikipedia Fabian Flöck
Denny Vrandečić
Elena Simperl
Hypertext and Social Media 2012 English June 2012 Wikipedia is commonly used as a proving ground for research in collaborative systems. This is likely due to its popularity and scale, but also to the fact that large amounts of data about its formation and evolution are freely available to inform and validate theories and models of online collaboration. As part of the development of such approaches, revert detection is often performed as an important pre-processing step in tasks as diverse as the extraction of implicit networks of editors, the analysis of edit or editor features and the removal of noise when analyzing the emergence of the con-tent of an article. The current state of the art in revert detection is based on a rather naïve approach, which identifies revision duplicates based on MD5 hash values. This is an efficient, but not very precise technique that forms the basis for the majority of research based on revert relations in Wikipedia. In this paper we prove that this method has a number of important drawbacks - it only detects a limited number of reverts, while simultaneously misclassifying too many edits as reverts, and not distinguishing between complete and partial reverts. This is very likely to hamper the accurate interpretation of the findings of revert-related research. We introduce an improved algorithm for the detection of reverts based on word tokens added or deleted to adresses these drawbacks. We report on the results of a user study and other tests demonstrating the considerable gains in accuracy and coverage by our method, and argue for a positive trade-off, in certain research scenarios, between these improvements and our algorithm’s increased runtime. 13 0
Wikipédia et les bibliothèques : dix ans après Wikipédia et les bibliothèques : dix ans après Rémi Mathis Bibliothèques 2.0 : à l'heure des médias sociaux French June 2012 Etat des lieux sur les rapports entre les bibliothèques et Wikipédia en 2012. 1 0
What We Know About Wikipedia: A Review of the Literature Analyzing the Project(s) What We Know About Wikipedia: A Review of the Literature Analyzing the Project(s) Nicolas Jullien Social Science Research Network English 7 May 2012 This article proposes a review of the literature analyzing Wikipedia as a collective system for producing knowledge. 279 2
Panorama of the wikimediasphere Panorama of the wikimediasphere David Gómez-Fontanills Digithum English
Catalan
May 2012 The term wikimediasphere is proposed to refer to the group of WikiProjects, communities of editors, guidelines and organisations structured around the Wikimedia movement to generate free knowledge that is available to everyone. A description is made of the wikimediasphere, presenting the main projects and their characteristics, and its community, technological, regulatory, social and institutional dimensions are outlined. The wikimediasphere is placed in context and reference is made to its blurred boundaries. An explanation is provided of the role of the communities of editors of each project and their autonomy with respect to each other and to the Wikimedia Foundation. The author concludes by offering a panoramic view of the wikimediasphere. 10 0
The Truth of Wikipedia The Truth of Wikipedia Nathaniel Tkacz Digithum English
Catalan
May 2012 What does it mean to assert that Wikipedia has a relation to truth? That there is, despite regular claims to the contrary, an entire apparatus of truth in Wikipedia? In this article, I show that Wikipedia has in fact two distinct relations to truth: one which is well known and forms the basis of existing popular and scholarly commentaries, and another which refers to equally well-known aspects of Wikipedia, but has not been understood in terms of truth. I demonstrate Wikipedia’s dual relation to truth through a close analysis of the Neutral Point of View core content policy (and one of the project’s “Five Pillars”). I conclude by indicating what is at stake in the assertion that Wikipedia has a regime of truth and what bearing this has on existing commentaries. 7 0
Wikipedia's Role in Reputation Management: An Analysis of the Best and Worst Companies in the USA Wikipedia's Role in Reputation Management: An Analysis of the Best and Worst Companies in the USA Marcia W. DiStaso
Marcus Messner
Digithum English
Catalan
May 2012 Being considered one of the best companies in the USA is a great honor, but this reputation does not exempt businesses from negativity in the collaboratively edited online encyclopedia Wikipedia. Content analysis of corporate Wikipedia articles for companies with the best and worst reputations in the USA revealed that negative content outweighed positive content irrespective of reputation. It was found that both the best and the worst companies had more negative than positive content in Wikipedia. This is an important issue because Wikipedia is not only one of the most popular websites in the world, but is also often the first place people look when seeking corporate information. Although there was more content on corporate social responsibility in the entries for the ten companies with the best reputations, this was still overshadowed by content referring to legal issues or scandals. Ultimately, public relations professionals need to regularly monitor and request updates to their corporate Wikipedia articles regardless of what kind of company they work for. 4 0
Edição colaborativa na Wikipédia: desafios e possibilidades Edição colaborativa na Wikipédia: desafios e possibilidades Carlos Frederico de Brito d’Andréa Educação científica e cidadania: abordagens teóricas e metodológicas para a formação de pesquisadores juvenis Portuguese March 2012 14 1
Valorisation du bénévolat sur Wikipédia Valorisation du bénévolat sur Wikipédia Vincent Juhel French February 2012 Wikipédia a un fonctionnement atypique dont les recherches s’attardent majoritairement autour de la qualité des articles potentiellement rédigés par n’importe qui. J’ai cherché par cette thèse professionnelle à présenter un regard quantitatif et qualitatif de la véritable valeur que ce projet apporte aux lecteurs, rédacteurs, donateurs mais également ce qu’il aurait représenté s’il avait été une entreprise classique. Le premier objectif était d’évaluer la valeur du travail de ces bénévoles, qui, en dépit sa gratuité, apporte une véritable richesse. Mieux définir cette richesse, c’est aussi mieux convaincre les donateurs et avoir plus de poids vis à vis des partenaires. Le deuxième objectif a été de définir les contours d’une stratégie cherchant à maximiser la valeur produite par une communauté de bénévoles en grande partie autogérée. Mieux maîtriser la valeur produite pour mieux orienter et motiver le travail des contributeurs. 0 1
"Askwiki": Shallow semantic processing to query Wikipedia "Askwiki": Shallow semantic processing to query Wikipedia Burkhardt F.
Jia Zhou
European Signal Processing Conference English 2012 We describe an application to query Wikipedia with a voice interface on a mobile device, i.e. smart phone or tablet computer. The aim was to develop a so-called App that installs easily on an android phone and does not need large vocabularies. It can be used to either answer questions directly, if the information is contained in a table or matches some keyword syntax (like birth place), or get access to an article's sub chapter. An evaluation based on 25 test users showed the feasibility of the approach. 0 0
A Breakdown of Quality Flaws in Wikipedia A Breakdown of Quality Flaws in Wikipedia Maik Anderka
Benno Stein
2nd Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality 12) English 2012 The online encyclopedia Wikipedia is a successful example of the increasing popularity of user generated content on the Web. Despite its success, Wikipedia is often criticized for containing low-quality information, which is mainly attributed to its core policy of being open for editing by everyone. The identification of low-quality information is an important task since Wikipedia has become the primary source of knowledge for a huge number of people around the world. Previous research on quality assessment in Wikipedia either investigates only small samples of articles, or else focuses on single quality aspects, like accuracy or formality. This paper targets the investigation of quality flaws, and presents the first complete breakdown of Wikipedia's quality flaw structure. We conduct an extensive exploratory analysis, which reveals (1) the quality flaws that actually exist, (2) the distribution of flaws in Wikipedia, and (3) the extent of flawed content. An important finding is that more than one in four English Wikipedia articles contains at least one quality flaw, 70% of which concern article verifiability. 0 0
A Cross-Lingual Dictionary for English Wikipedia Concepts A Cross-Lingual Dictionary for English Wikipedia Concepts Valentin I. Spitkovsky
Angel X. Chang
Proceedings of the Eighth International Conference on Language Resources and Evaluation English 2012 We present a resource for automatically associating strings of text with English Wikipedia concepts. Our machinery is bi-directional, in the sense that it uses the same fundamental probabilistic methods to map strings to empirical distributions over Wikipedia articles as it does to map article URLs to distributions over short, language-independent strings of natural language text. For maximal interoperability, we release our resource as a set of flat line-based text files, lexicographically sorted and encoded with UTF-8. These files capture joint probability distributions underlying concepts (we use the terms article, concept and Wikipedia URL interchangeably) and associated snippets of text, as well as other features that can come in handy when working with Wikipedia articles and related information. 5 0
A Jester's Promenade: Citations to Wikipedia in Law Reviews, 2002-2008 A Jester's Promenade: Citations to Wikipedia in Law Reviews, 2002-2008 Daniel J. Baker I/S: A Journal of Law and Policy for the Information Society 2012 Due to its perceived omniscience and ease-of-use, reliance on the online encyclopedia Wikipedia as a source for information has become pervasive. As a result, scholars and commentators have begun turning their attentions toward this resource and its uses. The main focus of previous writers, however, has been on the use of Wikipedia in the judicial process, whether by litigants relying on Wikipedia in their pleadings or judges relying on it in their decisions. No one, until now, has examined the use of Wikipedia in the legal scholarship context. This article intends to shine a light on the citation aspect of the Wikipedia-as-authority phenomenon by providing detailed statistics on the scope of its use and critiquing or building on the arguments of other commentators. Part II provides an overview of the debate regarding the citation of Wikipedia, beginning with a general discussion on the purposes of citation. In this Part, this article examines why some authors choose to cite to Wikipedia and explains why such citation is nonetheless problematic despite its perceived advantages. A citation analysis performed on works published by nearly 500 American law reviews between 2002 and 2008 is the focus of Part III, from a description of the methodology to an examination of the results of the analysis and any trends that may be discerned from the statistics. Finally, Part IV examines the propriety of citing to Wikipedia, culminating in a call for tighter editorial standards in law reviews. 0 0
A Linked Data platform for mining software repositories A Linked Data platform for mining software repositories Keivanloo I.
Forbes C.
Hmood A.
Erfani M.
Neal C.
Peristerakis G.
Rilling J.
IEEE International Working Conference on Mining Software Repositories English 2012 The mining of software repositories involves the extraction of both basic and value-added information from existing software repositories. The repositories will be mined to extract facts by different stakeholders (e.g. researchers, managers) and for various purposes. To avoid unnecessary pre-processing and analysis steps, sharing and integration of both basic and value-added facts are needed. In this research, we introduce SeCold, an open and collaborative platform for sharing software datasets. SeCold provides the first online software ecosystem Linked Data platform that supports data extraction and on-the-fly inter-dataset integration from major version control, issue tracking, and quality evaluation systems. In its first release, the dataset contains about two billion facts, such as source code statements, software licenses, and code clones from 18 000 software projects. In its second release the SeCold project will contain additional facts mined from issue trackers and versioning systems. Our approach is based on the same fundamental principle as Wikipedia: researchers and tool developers share analysis results obtained from their tools by publishing them as part of the SeCold portal and therefore make them an integrated part of the global knowledge domain. The SeCold project is an official member of the Linked Data dataset cloud and is currently the eighth largest online dataset available on the Web. 0 0
A Wikipedia-based corpus reference tool A Wikipedia-based corpus reference tool Jason Ginsburg HCCE English 2012 This paper describes a dictionary-like reference tool that is designed to help users find information that is similar to what one would find in a dictionary when looking up a word, except that this information is extracted automatically from large corpora. For a particular vocabulary item, a user can view frequency information, part-of-speech distribution, word-forms, definitions, example paragraphs and collocations. All of this information is extracted automatically from corpora and most of this information is extracted from Wikipedia. Since Wikipedia is a massive corpus covering a diverse range of general topics, this information is probably very representative of how target words are used in general. This project has applications for English language teachers and learners, as well as for language researchers. 0 0
Previous     Results 251 – 500    Next        (20 | 50 | 100 | 250 | 500)