|<< 2011 - 2012 - 2013 - 2014 - 2015 - 2016 - 2017 >>|
This is a list of 0 events celebrated and 258 publications published in 2014.
|A case study of contributor behavior in Q&A site and tags: The importance of prominent profiles in community productivity||Furtado A.
|Datamining and machine learning
Studies of Wikipedia web
|Journal of the Brazilian Computer Society||Background: Question-and-answer (Q&A) sites have shown to be a valuable resource for helping people to solve their everyday problems. These sites currently enable a large number of contributors to exchange expertise by different ways (creating questions, answers or comments, and voting in these), and it is noticeable that they contribute in diverse amounts and create content of varying quality. Methods: Concerned with diversity of behaviors, this paper advances present knowledge about Q&A sites by performing a cluster analysis with a multifaceted view of contributors that account for their motivations and abilities to identify the most common behavioral profiles in these sites. Results: By examining all contributors' activity from a large site named Super User, we unveil nine behavioral profiles that group users according to the quality and quantity of their contributions. Based on these profiles, we analyze the community composition and the importance of each profile in the site's productivity. Moreover, we also investigate seven tag communities from Super User aiming to experiment with the generality of our results. In this context, the same nine profiles were found, and it was also observed that there is a remarkable similarity between the composition and productivity of the communities defined by the seven tags and the site itself. Conclusions: The profiles uncovered enhance the overall understanding of how Q&A sites work and knowing these profiles can support the site's management. Furthermore, an analysis of particularities in the tag communities comparison relates the variation in behavior to the typical behavior of each tag community studied, what also draws implications for creating administrative strategies. © 2014 Furtado; licensee Springer.||0||0|
|A cloud-based real-time mobile collaboration wiki system||Wang W.H.
|Applied Mechanics and Materials||English||Wiki system is an important application using Wiki technology for knowledge sharing on internet nowadays. Existing Wikipedia system have been developed with distributed collaboration ability, which most of them can't support mobile and real-time collaboration. In this paper, a novel real-time mobile collaboration WiKi system based on cloud was presented. At first, the real-time request of user group in cloud-based mobile collaboration WiKi system was discussed, and group pattern for mobile collaboration Wiki system (GPMCW) was constructed in mobile cloud environment. After that, the multiple layered Web architecture oriented mobile cloud environment was proposed. Then, the instance of cloud-based real-time mobile collaboration WiKi system (RMCWS) was given. To demonstrate the feasibility of system, a prototype system named mobile group collaboration supporting platform (MGCSP) has been constructed based on it. Practice shows that RMCWS is robust and efficient for supporting real-time mobile group collaboration, and has a good ability to idea-sharing and knowledge communication for the people.||0||0|
|A composite kernel approach for dialog topic tracking with structured domain knowledge from Wikipedia||Soo-Hwan Kim
|52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference||English||Dialog topic tracking aims at analyzing and maintaining topic transitions in ongoing dialogs. This paper proposes a composite kernel approach for dialog topic tracking to utilize various types of domain knowledge obtained from Wikipedia. Two kernels are defined based on history sequences and context trees constructed based on the extracted features. The experimental results show that our composite kernel approach can significantly improve the performances of topic tracking in mixed-initiative human-human dialogs.||0||0|
|A correlation-based semantic model for text search||Sun J.
|Lecture Notes in Computer Science||English||With the exponential growth of texts on the Internet, text search is considered a crucial problem in many fields. Most of the traditional text search approaches are based on "bag of words" text representation based on frequency statics. However, these approaches ignore the semantic correlation of words in the text. So this may lead to inaccurate ranking of the search results. In this paper, we propose a new Wikipedia-based similar text search approach that the words in the texts and query text could be semantic correlated in Wikipedia. We propose a new text representation model and a new text similarity metric. Finally, the experiments on the real dataset demonstrate the high precision, recall and efficiency of our approach.||0||0|
|A cross-cultural comparison on contributors' motivations to online knowledge sharing: Chinese vs. Germans||Zhu B.
|Lecture Notes in Computer Science||English||Wikipedia is the most popular online knowledge sharing platform in western countries. However, it is not widely accepted in eastern countries. This indicates that culture plays a key role in determining users' acceptance of online knowledge sharing platforms. The purpose of this study is to investigate the cultural differences between Chinese and Germans in motivations for sharing knowledge, and further examine the impacts of these motives on the actual behavior across two cultures. A questionnaire was developed to explore the motivation factors and actual behavior of contributors. 100 valid responses were received from Chinese and 34 responses from the Germans. The results showed that the motivations were significantly different between Chinese and Germans. The Chinese had more consideration for others and cared more about receiving reward and strengthening the relationship, whereas Germans had more concerns about losing competitiveness. The impact of the motives on the actual behavior was also different between Chinese and Germans.||0||0|
|A customized Web portal for the genome of the ctenophore Mnemiopsis leidyi||Moreland R.T.
|Customized Web portal
|BMC Genomics||English||Background: Mnemiopsis leidyi is a ctenophore native to the coastal waters of the western Atlantic Ocean. A number of studies on Mnemiopsis have led to a better understanding of many key biological processes, and these studies have contributed to the emergence of Mnemiopsis as an important model for evolutionary and developmental studies. Recently, we sequenced, assembled, annotated, and performed a preliminary analysis on the 150-megabase genome of the ctenophore, Mnemiopsis. This sequencing effort has produced the first set of whole-genome sequencing data on any ctenophore species and is amongst the first wave of projects to sequence an animal genome de novo solely using next-generation sequencing technologies.Description: The Mnemiopsis Genome Project Portal (http://research.nhgri.nih.gov/mnemiopsis/) is intended both as a resource for obtaining genomic information on Mnemiopsis through an intuitive and easy-to-use interface and as a model for developing customized Web portals that enable access to genomic data. The scope of data available through this Portal goes well beyond the sequence data available through GenBank, providing key biological information not available elsewhere, such as pathway and protein domain analyses; it also features a customized genome browser for data visualization.Conclusions: We expect that the availability of these data will allow investigators to advance their own research projects aimed at understanding phylogenetic diversity and the evolution of proteins that play a fundamental role in metazoan development. The overall approach taken in the development of this Web site can serve as a viable model for disseminating data from whole-genome sequencing projects, framed in a way that best-serves the specific needs of the scientific community. © 2014 Moreland et al.; licensee BioMed Central Ltd.||0||0|
|A framework for automated construction of resource space based on background knowledge||Yu X.
|Latent Dirichlet allocation
Resource space model
|Future Generation Computer Systems||English||Resource Space Model is a kind of data model which can effectively and flexibly manage the digital resources in cyber-physical system from multidimensional and hierarchical perspectives. This paper focuses on constructing resource space automatically. We propose a framework that organizes a set of digital resources according to different semantic dimensions combining human background knowledge in WordNet and Wikipedia. The construction process includes four steps: extracting candidate keywords, building semantic graphs, detecting semantic communities and generating resource space. An unsupervised statistical language topic model (i.e., Latent Dirichlet Allocation) is applied to extract candidate keywords of the facets. To better interpret meanings of the facets found by LDA, we map the keywords to Wikipedia concepts, calculate word relatedness using WordNet's noun synsets and construct corresponding semantic graphs. Moreover, semantic communities are identified by GN algorithm. After extracting candidate axes based on Wikipedia concept hierarchy, the final axes of resource space are sorted and picked out through three different ranking strategies. The experimental results demonstrate that the proposed framework can organize resources automatically and effectively.©2013 Published by Elsevier Ltd. All rights reserved.||0||0|
|A generic framework and methodology for extracting semantics from co-occurrences||Rachakonda A.R.
|Data and Knowledge Engineering||English||Extracting semantic associations from text corpora is an important problem with several applications. It is well understood that semantic associations from text can be discerned by observing patterns of co-occurrences of terms. However, much of the work in this direction has been piecemeal, addressing specific kinds of semantic associations. In this work, we propose a generic framework, using which several kinds of semantic associations can be mined. The framework comprises a co-occurrence graph of terms, along with a set of graph operators. A methodology for using this framework is also proposed, where the properties of a given semantic association can be hypothesized and tested over the framework. To show the generic nature of the proposed model, four different semantic associations are mined over a corpus comprising of Wikipedia articles. The design of the proposed framework is inspired from cognitive science - specifically the interplay between semantic and episodic memory in humans. © 2014 Elsevier B.V. All rights reserved.||0||0|
|A latent variable model for discourse-Aware concept and entity disambiguation||Angela Fahrni
|14th Conference of the European Chapter of the Association for Computational Linguistics 2014, EACL 2014||English||This paper takes a discourse-oriented perspective for disambiguating common and proper noun mentions with respect to Wikipedia. Our novel approach models the relationship between disambiguation and aspects of cohesion using Markov Logic Networks with latent variables. Considering cohesive aspects consistently improves the disambiguation results on various commonly used data sets.||0||0|
|A method for refining a taxonomy by using annotated suffix trees and wikipedia resources||Chernyak E.
|Procedia Computer Science||English||A two-step approach to taxonomy construction is presented. On the first step the frame of taxonomy is built manually according to some representative educational materials. On the second step, the frame is refined using the Wikipedia category tree and articles. Since the structure of Wikipedia is rather noisy, a procedure to clear the Wikipedia category tree is suggested. A string-to-text relevance score, based on annotated suffix trees, is used several times to 1) clear the Wikipedia data from noise; 2) to assign Wikipedia categories to taxonomy topics; 3) to choose whether the category should be assigned to the taxonomy topic or stay on intermediate levels. The resulting taxonomy consists of three parts: the manully set upper levels, the adopted Wikipedia category tree and the Wikipedia articles as leaves. Also, a set of so-called descriptors is assigned to every leaf; these are phrases explaining aspects of the leaf topic. The method is illustrated by its application to two domains: a) Probability theory and mathematical statistics, b) "Numerical analysis" (both in Russian). © 2014 Published by Elsevier B.V.||0||0|
|A methodology based on commonsense knowledge and ontologies for the automatic classification of legal cases||Capuano N.
De Maio C.
|ACM International Conference Proceeding Series||English||We describe a methodology for the automatic classification of legal cases expressed in natural language, which relies on existing legal ontologies and a commonsense knowledge base. This methodology is founded on a process consisting of three phases: an enrichment of a given legal ontology by associating its terms with topics retrieved from the Wikipedia knowledge base; an extraction of relevant concepts from a given textual legal case; and a matching between the enriched ontological terms and the extracted concepts. Such a process has been successfully implemented in a corresponding tool that is part of a larger framework for self-litigation and legal support for the Italian law.||0||0|
|A novel system for the semi automatic annotation of event images||McParlane P.J.
|Photo tag recommendation
|SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval||English||With the rise in popularity of smart phones, taking and sharing photographs has never been more openly accessible. Further, photo sharing websites, such as Flickr, have made the distribution of photographs easy, resulting in an increase of visual content uploaded online. Due to the laborious nature of annotating images, however, a large percentage of these images are unannotated making their organisation and retrieval difficult. Therefore, there has been a recent research focus on the automatic and semi-automatic process of annotating these images. Despite the progress made in this field, however, annotating images automatically based on their visual appearance often results in unsatisfactory suggestions and as a result these models have not been adopted in photo sharing websites. Many methods have therefore looked to exploit new sources of evidence for annotation purposes, such as image context for example. In this demonstration, we instead explore the scenario of annotating images taken at a large scale events where evidences can be extracted from a wealth of online textual resources. Specifically, we present a novel tag recommendation system for images taken at a popular music festival which allows the user to select relevant tags from related Tweets and Wikipedia content, thus reducing the workload involved in the annotation process. Copyright 2014 ACM.||0||0|
|A perspective-aware approach to search: Visualizing perspectives in news search results||Qureshi M.A.
|SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval||English||The result set from a search engine for any user's query may exhibit an inherent perspective due to issues with the search engine or issues with the underlying collection. This demonstration paper presents a system that allows users to specify at query time a perspective together with their query. The system then presents results from well-known search engines with a visualization of the results which allows the users to quickly surmise the presence of the perspective in the returned set.||0||0|
|A piece of my mind: A sentiment analysis approach for online dispute detection||Lei Wang
|52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference||English||We investigate the novel task of online dispute detection and propose a sentiment analysis solution to the problem: we aim to identify the sequence of sentence-level sentiments expressed during a discussion and to use them as features in a classifier that predicts the DISPUTE/NON-DISPUTE label for the discussion as a whole. We evaluate dispute detection approaches on a newly created corpus of Wikipedia Talk page disputes and find that classifiers that rely on our sentiment tagging features outperform those that do not. The best model achieves a very promising F1 score of 0.78 and an accuracy of 0.80.||0||0|
|A scalable gibbs sampler for probabilistic entity linking||Houlsby N.
|Lecture Notes in Computer Science||English||Entity linking involves labeling phrases in text with their referent entities, such as Wikipedia or Freebase entries. This task is challenging due to the large number of possible entities, in the millions, and heavy-tailed mention ambiguity. We formulate the problem in terms of probabilistic inference within a topic model, where each topic is associated with a Wikipedia article. To deal with the large number of topics we propose a novel efficient Gibbs sampling scheme which can also incorporate side information, such as the Wikipedia graph. This conceptually simple probabilistic approach achieves state-of-the-art performance in entity-linking on the Aida-CoNLL dataset.||0||0|
|A seed based method for dictionary translation||Krajewski R.
|Lecture Notes in Computer Science||English||The paper refers to the topic of automatic machine translation. The proposed method enables translating a dictionary by means of mining repositories in the source and target repository, without any directly given relationships connecting two languages. It consists of two stages: (1) translation by lexical similarity, where words are compared graphically, and (2) translation by semantic similarity, where contexts are compared. Polish and English version of Wikipedia were used as multilingual corpora. The method and its stages are thoroughly analyzed. The results allow implementing this method in human-in-the-middle systems.||0||0|
|A social network system for sharing construction safety and health knowledge||Le Q.T.
|Automation in Construction||English||Due to the complicated and complex working environments, construction site still presents high accident rate, which is causing serious project delay and cost overrun. Abundant studies have focused on cause and effect on fatalities or safety training system, and so on. Most of them on this issue have been emphasized the necessity and utilization of information, rather than how to exchange, share and transfer safety data efficiently in the construction industry. With this regard, this paper proposes the Social Network System for Sharing Construction Safety & Health Knowledge (SNSS), which utilizes state-of-the-art of semantic wiki web and ontology construction technologies for better communication and representation for construction safety information. The SNSS is developed on the basis of safety semantic wiki template (SSWT), which consists of the following three modules: 1) A Safety information module (SIM) which upload common accident and hazard information for sharing; 2) A Safety knowledge module (SKM) where the safety information is refined, confirmed and transferred to safety knowledge; 3) A Safety dissemination module (SDM) which allows its users to monitor, manage and retrieve safety information and knowledge easily. The SNSS is tested by a scenario of using falling accident information by which the potentials and limitations of the system were addressed. The study emphasizes the potential applicability and benefits of social network system that could be utilized to enhance communication among participants in the construction industry. © 2014 Elsevier B.V.||0||0|
|A student perception related to the implementation of virtual courses||Chilian A.
|Access to information
|Lecture Notes in Computer Science||English||This paper aims to characterize the point of view of the students regarding virtual courses in education, but in particular the study is based on the experience gained by the students in the Designing TEL course, organized in the frame of CoCreat project. Thus, it was noticed that a very important role in the development of virtual courses was played by using Wiki and Moodle platforms. Even there are still some problems on implementing virtual courses using those platforms, Designing TEL course can be considered a successful one.||0||0|
|A study of the wikipedia knowledge recommendation service for satisfaction of ePortfolio Users||Huang C.-H.
|Lecture Notes in Electrical Engineering||English||This study extended a conventional ePortfolio by proposed Wikipedia knowledge recommendation service (WKRS). Participants included 100 students taking courses at National Central University which were divided into experimental group and control group. The control group students and experimental group students have created their learning portfilios by using ePortfolio with WKRS and conventional ePortfolio without WKRS, respectively. The data for this study was collected over 3 months. The experimental results have shown that the learners' satisfaction, system use, system quality and information/knowledge quality of experimental group students have significant progress than control group students.||0||0|
|A user centred approach to represent expert knowledge: A case study at STMicroelectronics||Brichni M.
|Proceedings - International Conference on Research Challenges in Information Science||English||The rapid growth of companies, the departure of employees, the complexity of the new technology and the rapid proliferation of information, are reasons why companies seek to capitalize their expert knowledge. STMicroelectronics has opted for a Wiki to effectively capitalize and share some of its knowledge. However, to accomplish its objective, the Wiki content must correspond to users' needs. Therefore, we propose a user centred approach for the definition of knowledge characteristics and its integration in theWiki. Our knowledge representation is based on three facets "What, Why and How". In this paper, the approach is applied to the Reporting activity at STMicroelectronics, which is considered as a knowledge intensive activity.||0||0|
|Academic opinions of Wikipedia and open access publishing||Xiao L.
Open Access publishing
|Online Information Review||English||Purpose - The purpose of this paper is to examine academics' awareness of and attitudes towards Wikipedia and Open Access journals for academic publishing to better understand the perceived benefits and challenges of these models. Design/methodology/approach - Bases for analysis include comparison of the models, enumeration of their advantages and disadvantages, and investigation of Wikipedia's web structure in terms of potential for academic publishing. A web survey was administered via department-based invitations and listservs. Findings - The survey results show that: Wikipedia has perceived advantages and challenges in comparison to the Open Access model; the academic researchers' increased familiarity is associated with increased comfort with these models; and the academic researchers' attitudes towards these models are associated with their familiarity, academic environment, and professional status. Research limitations/implications - The major limitation of the study is sample size. The result of a power analysis with GPower shows that authors could only detect big effects in this study at statistical power 0.95. The authors call for larger sample studies that look further into this topic. Originality/value - This study contributes to the increasing interest in adjusting methods of creating and disseminating academic knowledge by providing empirical evidence of the academics' experiences and attitudes towards the Open Access and Wikipedia publishing models. This paper provides a resource for researchers interested in scholarly communication and academic publishing, for research librarians, and for the academic community in general. Copyright © 2014 Emerald Group Publishing Limited. All rights reserved.||0||0|
|Acquisition des traductions de requêtes à partir de wikipédia pour la recherche d'information translingue||Chakour H.
|Vision 2020: Sustainable Growth, Economic Development, and Global Competitiveness - Proceedings of the 23rd International Business Information Management Association Conference, IBIMA 2014||French||The multilingual encyclopedia Wikipedia has become a very useful resource for the construction and enrichment of linguistic resources, such as dictionaries and ontologies. In this study, we are interested by the exploitation of Wikipedia for query translation in Cross-Language Information Retrieval. An application is completed for the Arabic-English pair of languages. All possible translation candidates are extracted from the titles of Wikipedia articles based on the inter-links between Arabic and English; which is considered as direct translation. Furthermore, other links such as Arabic to French and French to English are exploited for a transitive translation. A slight stemming and segmentation of the query into multiple tokens can be made if no translation can be found for the entire query. Assessments monolingual and cross-lingual systems were conducted using three weighting schemes of the Lucene search engine (default, Tf-Idf and BM25). In addition, the performance of the so-called translation method was compared with those of GoogleTranslate and MyMemory.||0||0|
|An automatic sameAs link discovery from Wikipedia||Kagawa K.
|Lecture Notes in Computer Science||English||Spelling variants of words or word sense ambiguity takes many costs in such processes as Data Integration, Information Searching, data pre-processing for Data Mining, and so on. It is useful to construct relations between a word or phrases and a representative name of the entity to meet these demands. To reduce the costs, this paper discusses how to automatically discover "sameAs" and "meaningOf" links from Japanese Wikipedia. In order to do so, we gathered relevant features such as IDF, string similarity, number of hypernym, and so on. We have identified the link-based score on salient features based on SVM results with 960,000 anchor link pairs. These case studies show us that our link discovery method goes well with more than 70% precision/ recall rate.||0||0|
|An evaluation framework for cross-lingual link discovery||Tang L.-X.
Cross-lingual link discovery
|Information Processing and Management||English||Cross-Lingual Link Discovery (CLLD) is a new problem in Information Retrieval. The aim is to automatically identify meaningful and relevant hypertext links between documents in different languages. This is particularly helpful in knowledge discovery if a multi-lingual knowledge base is sparse in one language or another, or the topical coverage in each language is different; such is the case with Wikipedia. Techniques for identifying new and topically relevant cross-lingual links are a current topic of interest at NTCIR where the CrossLink task has been running since the 2011 NTCIR-9. This paper presents the evaluation framework for benchmarking algorithms for cross-lingual link discovery evaluated in the context of NTCIR-9. This framework includes topics, document collections, assessments, metrics, and a toolkit for pooling, assessment, and evaluation. The assessments are further divided into two separate sets: manual assessments performed by human assessors; and automatic assessments based on links extracted from Wikipedia itself. Using this framework we show that manual assessment is more robust than automatic assessment in the context of cross-lingual link discovery.||0||0|
|An information retrieval expansion model based on Wikipedia||Gan L.X.
|Advanced Materials Research||English||Query expansion is one of the key technologies for improving precision and recall in information retrieval. In order to overcome limitations of single corpus, in this paper, semantic characteristics of Wikipedia corpus is combined with the standard corpus to extract more rich relationship between terms for construction of a steady Markov semantic network. Information of the entity pages and disambiguation pages in Wikipedia is comprehensively utilized to classify query terms to improve query classification accuracy. Related candidates with high quality can be used for query expansion according to semantic pruning. The proposal in our work is benefit to improve retrieval performance and to save search computational cost.||0||0|
|Analysing the duration of trending topics in twitter using wikipedia||Thanh Tran
|WebSci 2014 - Proceedings of the 2014 ACM Web Science Conference||English||The analysis of trending topics in Twitter is a goldmine for a variety of studies and applications. However, the contents of topics vary greatly from daily routines to major public events, enduring from a few hours to weeks or months. It is thus helpful to distinguish trending topics related to real- world events with those originated within virtual communi- ties. In this paper, we analyse trending topics in Twitter using Wikipedia as reference for studying the provenance of trending topics. We show that among difierent factors, the duration of a trending topic characterizes exogenous Twitter trending topics better than endogenous ones. Copyright||0||0|
|Analysis of the accuracy and readability of herbal supplement information on Wikipedia||Phillips J.
|Journal of the American Pharmacists Association||English||Objective: To determine the completeness and readability of information found in Wikipedia for leading dietary supplements and assess the accuracy of this information with regard to safety (including use during pregnancy/lactation), contraindications, drug interactions, therapeutic uses, and dosing. Design: Cross-sectional analysis of Wikipedia articles. Interventions: The contents of Wikipedia articles for the 19 top-selling herbal supplements were retrieved on July 24, 2012, and evaluated for organization, content, accuracy (as compared with information in two leading dietary supplement references) and readability. Main Outcome Measures: Accuracy of Wikipedia articles. Results: No consistency was noted in how much information was included in each Wikipedia article, how the information was organized, what major categories were used, and where safety and therapeutic information was located in the article. All articles in Wikipedia contained information on therapeutic uses and adverse effects but several lacked information on drug interactions, pregnancy, and contraindications. Wikipedia articles had 26%-75% of therapeutic uses and 76%-100% of adverse effects listed in the Natural Medicines Comprehensive Database and/or Natural Standard. Overall, articles were written at a 13.5-grade level, and all were at a ninth-grade level or above. Conclusion: Articles in Wikipedia in mid-2012 for the 19 top-selling herbal supplements were frequently incomplete, of variable quality, and sometimes inconsistent with reputable sources of information on these products. Safety information was particularly inconsistent among the articles. Patients and health professionals should not rely solely on Wikipedia for information on these herbal supplements when treatment decisions are being made.||0||0|
|Apply wiki for improving intellectual capital and effectiveness of project management at Cideco company||Misra S.
|Lecture Notes in Computer Science||English||Today, knowledge is considered the only source for creating the competitive advantages of modern organizations. However, managing intellectual capital is challenged, especially for SMEs in developing countries like Vietnam. In order to help SMEs to build KMS and to stimulate their intellectual capital, a suitable technical platform for collaboration is needed. Wiki is a cheap technology for improving both intellectual capital and effectiveness of project management. However, there is a lack of proof about real benefit of applying wiki in Vietnamese SMEs. Cideco Company, a Vietnamese SME in construction design & consulting industry, is finding a solution to manage its intellectual capital for improving the effectiveness of project management. In this research, wiki is applied and tested to check whether it can be a suitable technology for Cideco to stimulate its intellectual capital and to improve the effectiveness of project management activities. Besides, a demo wiki is also implemented for 2 pilot projects to evaluate its real benefit. Analysis results showed that wiki can help to increase both intellectual capital and effectiveness of project management at Cideco.||0||0|
|Approach for building high-quality domain ontology based on the Chinese Wikipedia||Wu T.
Quality of article
|ICIC Express Letters||English||In this paper, we propose a new approach for building high-quality domain ontology based on the Chinese Wikipedia. In contrast to traditional Wikipedia ontologies, such as DBpedia and YAGO, the domain ontology built in this paper consist of highquality articles. We make use of the C4.5 algorithm to hunt high-quality articles from specific domain in Wikipedia. As a result, a domain ontology is built accordingly.||0||0|
|Arabic text categorization based on arabic wikipedia||Yahya A.
|Arabic natural language processing
|ACM Transactions on Asian Language Information Processing||English||This article describes an algorithm for categorizing Arabic text, relying on highly categorized corpus-based datasets obtained from the Arabic Wikipedia by using manual and automated processes to build and customize categories. The categorization algorithm was built by adopting a simple categorization idea then moving forward to more complex ones. We applied tests and filtration criteria to reach the best and most efficient results that our algorithm can achieve. The categorization depends on the statistical relations between the input (test) text and the reference (training) data supported by well-defined Wikipedia-based categories. Our algorithm supports two levels for categorizing Arabic text; categories are grouped into a hierarchy of main categories and subcategories. This introduces a challenge due to the correlation between certain subcategories and overlap between main categories. We argue that our algorithm achieved good performance compared to other methods reported in the literature.||0||0|
|Architecture description leveraging model driven engineering and semantic wikis||Baroni A.
|Proceedings - Working IEEE/IFIP Conference on Software Architecture 2014, WICSA 2014||English||A previous study, run by some of the authors in collaboration with practitioners, has emphasized the need to improve architectural languages in order to (i) make them simple and intuitive enough to communicate effectively with project stakeholders, and (ii) enable formality and rigour to allow analysis and other automated tasks. Although a multitude of languages have been created by researchers and practitioners, they rarely address both of these needs. In order to reconcile these divergent needs, this paper presents an approach that (i) combines the rigorous foundations of model-driven engineering with the usability of semantic wikis, and (ii) enables continuous syncronization between them, this allows software architects to simultaneously use wiki pages for communication and models for model-based analysis and manipulation. In this paper we explain how we applied the approach to an industry-inspired case study using the Semantic Media Wiki wiki engine and a model-driven architecture description implemented within the Eclipse Modeling Framework. We also discuss how our approach can be generalized to other wiki-based and model-driven technologies.||0||0|
|Are we all online content creators now? Web 2.0 and digital divides||Brake D.R.||Citizen journalism
|Journal of Computer-Mediated Communication||English||Despite considerable interest in online content creation there has been comparatively little academic analysis of the distribution of such practices, both globally and among social groups within countries. Drawing on theoretical frameworks used in digital divide studies, I outline differences in motivation, access, skills, and usage that appear to underlie and perpetuate differences in online content creation practices between social groups. This paper brings together existing studies and new analyses of existing survey datasets. Together they suggest online content creators tend to be from relatively privileged groups and the content of online services based on their contributions may be biased towards what is most interesting or relevant to them. Some implications of these findings for policymakers and researchers are considered.||0||0|
|Assessing Article Quality in Wikipedia Using Machine Learning Algorithms||0||0|
|Assessing the quality of Thai Wikipedia articles using concept and statistical features||Saengthongpattana K.
Quality of Thai Wikipedia articles
|Advances in Intelligent Systems and Computing||English||The quality evaluation of Thai Wikipedia articles relies on user consideration. There are increasing numbers of articles every day therefore the automatic evaluation method is needed for user. Components of Wikipedia articles such as headers, pictures, references, and links are useful to indicate the quality of articles. However readers need complete content to cover all of concepts in that article. The concept features are investigated in this work. The aim of this research is to classify Thai Wikipedia articles into two classes namely high-quality and low-quality class. Three article domains (Biography, Animal, and Place) are testes with decision tree and Naïve Bayes. We found that Naïve Bayes gets high TP Rate compared to decision tree in every domain. Moreover, we found that the concept feature plays an important role in quality classification of Thai Wikipedia articles.||0||0|
|Augmenting concept definition in gloss vector semantic relatedness measure using wikipedia articles||Pesaranghader A.
Biomedical Text Mining
Natural Language Processing
|Lecture Notes in Electrical Engineering||English||Semantic relatedness measures are widely used in text mining and information retrieval applications. Considering these automated measures, in this research paper we attempt to improve Gloss Vector relatedness measure for more accurate estimation of relatedness between two given concepts. Generally, this measure, by constructing concepts definitions (Glosses) from a thesaurus, tries to find the angle between the concepts' gloss vectors for the calculation of relatedness. Nonetheless, this definition construction task is challenging as thesauruses do not provide full coverage of expressive definitions for the particularly specialized concepts. By employing Wikipedia articles and other external resources, we aim at augmenting these concepts' definitions. Applying both definition types to the biomedical domain, using MEDLINE as corpus, UMLS as the default thesaurus, and a reference standard of 68 concept pairs manually rated for relatedness, we show exploiting available resources on the Web would have positive impact on final measurement of semantic relatedness.||0||0|
|Automatic extraction of property norm-like data from large text corpora||Kelly C.
Natural Language Processing
Pointwise mutual information
|Cognitive Science||English||Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car-petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties.||0||0|
|Automatic theory generation from analyst text files using coherence networks||Shaffer S.C.||Coherence
|Proceedings of SPIE - The International Society for Optical Engineering||English||This paper describes a three-phase process of extracting knowledge from analyst textual reports. Phase 1 involves performing natural language processing on the source text to extract subject-predicate-object triples. In phase 2, these triples are then fed into a coherence network analysis process, using a genetic algorithm optimization. Finally, the highest-value sub networks are processed into a semantic network graph for display. Initial work on a well- known data set (a Wikipedia article on Abraham Lincoln) has shown excellent results without any specific tuning. Next, we ran the process on the SYNthetic Counter-INsurgency (SYNCOIN) data set, developed at Penn State, yielding interesting and potentially useful results.||0||0|
|Automatically detecting corresponding edit-turn-pairs in Wikipedia||Daxenberger J.
|52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference||English||In this study, we analyze links between edits in Wikipedia articles and turns from their discussion page. Our motivation is to better understand implicit details about the writing process and knowledge flow in collaboratively created resources. Based on properties of the involved edit and turn, we have defined constraints for corresponding edit-turn-pairs. We manually annotated a corpus of 636 corresponding and non-corresponding edit-turn-pairs. Furthermore, we show how our data can be used to automatically identify corresponding edit-turn-pairs. With the help of supervised machine learning, we achieve an accuracy of 87 for this task.||0||0|
|Behavioral aspects in the interaction between wikipedia and its users||Reinoso A.J.
|Studies in Computational Intelligence||English||Wikipedia continues to be the most well-known on-line encyclopedia and receives the visits of millions of users on a daily basis. Its contents correspond to almost all the knowledge areas and are altruistically contributed by individuals and organizations. In addition, users are encouraged to add their own contributions according to the Wikipedia's own supporting paradigm. Its progression to a mass phenomenon has propitiated many studies and research initiatives. Therefore, topics such as the quality of the published contents or the authoring of its contributions have been widely developed. However, very few attention has been paid to the behavioral aspects characterizing the interaction between Wikipedia and its users. Henceforth, this chapter aims to determine the habits exhibited by users when browsing the Wikipedia pages. Particularly, we will focus on visits and contributions, as they constitute the two most common forms of interaction. Our study is based on a sample of the requests submitted to Wikipedia, and its results are twofold: on the one hand, it provides different metrics concerning users' behavior and, on the other, presents particular comparisons among different Wikipedia editions.||0||0|
|Beyond the encyclopedia: Collective memories in Wikipedia||Michela Ferron
|Memory Studies||English||Collective memory processes have been studied from many different perspectives. For example, while psychology has investigated collaborative recall in small groups, other research traditions have focused on flashbulb memories or on the cultural processes involved in the formation of collective memories of entire nations. In this article, considering the online encyclopedia Wikipedia as a global memory place, we analyze online commemoration patterns of traumatic events. We extracted 88 articles and talk pages related to traumatic events, and using logistic regression, we analyzed their edit activity comparing it with more than 370,000 other Wikipedia pages. Results show that the relative amount of edits during anniversaries can significantly distinguish between pages related to traumatic events and other pages. The logistic regression results, together with the transcription of a group of messages exchanged by the users during the anniversaries of the September 11 attacks and the Virginia Tech massacre, suggest that commemoration activities take place in Wikipedia, opening the way to the quantitative study of online collective memory building processes on a large scale.||0||0|
|Bipartite editing prediction in wikipedia||Chang Y.-J.
|Journal of Information Science and Engineering||English||Link prediction problems aim to project future interactions among members in a social network that have not communicated with each other in the past. Classical approaches for link prediction usually use local information, which considers the similarity of two nodes, or structural information such as the immediate neighborhood. However, when using a bipartite graph to represent activity, there is no straightforward similarity measurement between two linking nodes. However, when a bipartite graph shows two nodes of different types, they will not have any common neighbors, so the local model will need to be adjusted if the users' goal is to predict bipartite relations. In addition to local information regarding similarity, when dealing with link predictions in a social network, it is natural to employ community information to improve the prediction accuracy. In this paper, we address the link prediction problem in the bipartite editing graph used in Wikipedia and also examine the structure of community in this edit graph. As Wikipedia is one of the successful member-maintained online communities, extracting the community information and solving its bipartite link prediction problem will shed light on the process of content creation. In addition, to the best of our knowledge, the problem of using community information in bipartite for predicting the link occurrence has not been clearly addressed. Hence we have designed and integrated two bipartite-specific approaches to predict the link occurrence: First, the supervised learning approach, which is built around the adjusted features of a local model and, second, the community-awareness approach, which utilizes community information. Experiments conducted on the Wikipedia collection show that in terms of F1-measure, our approaches generates an 11% improvement over the general methods based on the K-Nearest Neighbor. In addition to this, we also investigate the structure of communities in the editing network and suggest a different approach to examining the communities involved in Wikipedia.||0||0|
|Boosting terminology extraction through crosslingual resources||Cajal S.
|Procesamiento de Lenguaje Natural||English||Terminology Extraction is an important Natural Language Processing task with multiple applications in many areas. The task has been approached from different points of view using different techniques. Language and domain independent systems have been proposed as well. Our contribution in this paper focuses on the improvements on Terminology Extraction using crosslingual resources and specifically the Wikipedia and on the use of a variant of PageRank for scoring the candidate terms.||0||0|
|Bootstrapping Wikipedia to answer ambiguous person name queries||Gruetze T.
|Proceedings - International Conference on Data Engineering||English||Some of the main ranking features of today's search engines reflect result popularity and are based on ranking models, such as PageRank, implicit feedback aggregation, and more. While such features yield satisfactory results for a wide range of queries, they aggravate the problem of search for ambiguous entities: Searching for a person yields satisfactory results only if the person in question is represented by a high-ranked Web page and all required information are contained in this page. Otherwise, the user has to either reformulate/refine the query or manually inspect low-ranked results to find the person in question. A possible approach to solve this problem is to cluster the results, so that each cluster represents one of the persons occurring in the answer set. However clustering search results has proven to be a difficult endeavor by itself, where the clusters are typically of moderate quality. A wealth of useful information about persons occurs in Web 2.0 platforms, such as Wikipedia, LinkedIn, Facebook, etc. Being human-generated, the information on these platforms is clean, focused, and already disambiguated. We show that when searching with ambiguous person names the information from Wikipedia can be bootstrapped to group the results according to the individuals occurring in them. We have evaluated our methods on a hand-labeled dataset of around 5,000 Web pages retrieved from Google queries on 50 ambiguous person names.||0||0|
|Bots, bespoke, code and the materiality of software platforms||Geiger R.S.||Algorithms
|Information Communication and Society||English||This article introduces and discusses the role of bespoke code in Wikipedia, which is code that runs alongside a platform or system, rather than being integrated into server-side codebases by individuals with privileged access to the server. Bespoke code complicates the common metaphors of platforms and sovereignty that we typically use to discuss the governance and regulation of software systems through code. Specifically, the work of automated software agents (bots) in the operation and administration of Wikipedia is examined, with a focus on the materiality of code. As bots extend and modify the functionality of sites like Wikipedia, but must be continuously operated on computers that are independent from the servers hosting the site, they involve alternative relations of power and code. Instead of taking for granted the pre-existing stability of Wikipedia as a platform, bots and other bespoke code require that we examine not only the software code itself, but also the concrete, historically contingent material conditions under which this code is run. To this end, this article weaves a series of autobiographical vignettes about the author's experiences as a bot developer alongside more traditional academic discourse.||0||0|
|Brede tools and federating online neuroinformatics databases||Finn Årup Nielsen||Data federation
|Neuroinformatics||English||As open science neuroinformatics databases the Brede Database and Brede Wiki seek to make distribution and federation of their content as easy and transparent as possible. The databases rely on simple formats and allow other online tools to reuse their content. This paper describes the possible interconnections on different levels between the Brede tools and other databases.||0||0|
|Bridging temporal context gaps using time-aware re-contextualization||Ceroni A.
|SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval||English||Understanding a text, which was written some time ago, can be compared to translating a text from another language. Complete interpretation requires a mapping, in this case, a kind of time-travel translation between present context knowledge and context knowledge at time of text creation. In this paper, we study time-aware re-contextualization, the challenging problem of retrieving concise and complementing information in order to bridge this temporal context gap. We propose an approach based on learning to rank techniques using sentence-level context information extracted from Wikipedia. The employed ranking combines relevance, complementarity and time-awareness. The effectiveness of the approach is evaluated by contextualizing articles from a news archive collection using more than 7,000 manually judged relevance pairs. To this end, we show that our approach is able to retrieve a significant number of relevant context information for a given news article. Copyright 2014 ACM.||0||0|
|Building distant supervised relation extractors||Nunes T.
|Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014||English||A well-known drawback in building machine learning semantic relation detectors for natural language is the lack of a large number of qualified training instances for the target relations in multiple languages. Even when good results are achieved, the datasets used by the state-of-the-art approaches are rarely published. In order to address these problems, this work presents an automatic approach to build multilingual semantic relation detectors through distant supervision combining two of the largest resources of structured and unstructured content available on the Web, DBpedia and Wikipedia. We map the DBpedia ontology back to the Wikipedia text to extract more than 100.000 training instances for more than 90 DBpedia relations for English and Portuguese languages without human intervention. First, we mine the Wikipedia articles to find candidate instances for relations described in the DBpedia ontology. Second, we preprocess and normalize the data filtering out irrelevant instances. Finally, we use the normalized data to construct regularized logistic regression detectors that achieve more than 80% of F-Measure for both English and Portuguese languages. In this paper, we also compare the impact of different types of features on the accuracy of the trained detector, demonstrating significant performance improvements when combining lexical, syntactic and semantic features. Both the datasets and the code used in this research are available online.||0||0|
|Building sentiment lexicons for all major languages||Yirong Chen
|English||Sentiment analysis in a multilingual world remains a challenging problem, because developing language-specific sentiment lexicons is an extremely resourceintensive process. Such lexicons remain a scarce resource for most languages. In this paper, we address this lexicon gap by building high-quality sentiment lexicons for 136 major languages. We integrate a variety of linguistic resources to produce an immense knowledge graph. By appropriately propagating from seed words, we construct sentiment lexicons for each component language of our graph. Our lexicons have a polarity agreement of 95.7% with published lexicons, while achieving an overall coverage of 45.2%. We demonstrate the performance of our lexicons in an extrinsic analysis of 2,000 distinct historical figures' Wikipedia articles on 30 languages. Despite cultural difference and the intended neutrality of Wikipedia articles, our lexicons show an average sentiment correlation of 0.28 across all language pairs.||0||0|
|Capturing scholar's knowledge from heterogeneous resources for profiling in recommender systems||Amini B.
|Expert Systems with Applications||In scholars' recommender systems, acquisition knowledge for construction profiles is crucial because profiles provide fundamental information for accurate recommendation. Despite the availability of various knowledge resources, identification and collecting extensive knowledge in an unobtrusive manner is not straightforward. In order to capture scholars' knowledge, some questions must be answered: what knowledge resource is appropriate for profiling, how knowledge items can be unobtrusively captured, and how heterogeneity among different knowledge resources should be resolved. To address these issues, we first model the scholars' academic behavior and extract different knowledge items, diffused over the Web including mediated profiles in digital libraries, and then integrate those heterogeneous knowledge items by Wikipedia. Additionally, we analyze the correlation between knowledge items and partition the scholars' research areas for multi-disciplinary profiling. Compared to the state-of-the-art, the result of empirical evaluation shows the efficiency of our approach in terms of completeness and accuracy. © 2014 Elsevier Ltd. All rights reserved.||0||0|
|Changes in college students' perceptions of use of web-based resources for academic tasks with Wikipedia projects: A preliminary exploration||Traphagan T.
Neavel Dickens L.
|Interactive Learning Environments||English||Motivated by the need to facilitate Net Generation students' information literacy (IL), or more specifically, to promote student understanding of legitimate, effective use of Web-based resources, this exploratory study investigated how analyzing, writing, posting, and monitoring Wikipedia entries might help students develop critical perspectives related to the legitimacy of Wikipedia and other publicly accessible Web-based resources for academic tasks. Results of survey and interview data analyses from two undergraduate courses indicated that undergraduate students typically prefer using publicly accessible Web-based resources to traditional academic resources, such as scholarly journal articles and books both in print and digital form; furthermore, they view the former as helpful academic tools with various utilities. Results also suggest that the Wikipedia activity, integrated into regular course curriculum, led students to gain knowledge about processes of Web-based information creation, become more critical of information on the Web, and evaluate the use of publicly accessible Web-based resources for academic purposes. Such changes appear more conspicuous with first year than with upper division students. The findings suggest that experiential opportunities to grapple with the validity of publicly accessible Web-based resources may prepare students better for their college and professional careers. The study results also indicate the need for integrating multiple existing frameworks for IL into one comprehensive framework to better understand various aspects of students' knowledge, use, and production of information from cognitive and technical perspectives and for a variety of purposes.||0||0|
|Cheap talk and editorial control||Newton J.||Cheap talk
|B.E. Journal of Theoretical Economics||English||This paper analyzes simple models of editorial control. Starting from the framework developed by Krishna and Morgan (2001a), we analyze two-sender models of cheap talk where one or more of the senders has the power to veto messages before they reach the receiver. A characterization of the most informative equilibria of such models is given. It is shown that editorial control never aids communication and that for small biases in the senders' preferences relative to those of the receiver, necessary and sufficient conditions for information transmission to be adversely affected are (i) that the senders have opposed preferences relative to the receiver and (ii) that both senders have powers of editorial control. It is shown that the addition of further senders beyond two weakly decreases information transmission when senders exercising editorial control are anonymous, and weakly increases information transmission when senders exercising editorial control are observed.||0||0|
|Chinese and Korean cross-lingual issue news detection based on translation knowledge of Wikipedia||Zhao S.
|Cross-Lingual link discovery
Issue news detection
|Lecture Notes in Electrical Engineering||English||Cross-lingual issue news and analyzing the news content is an important and challenging task. The core of the cross-lingual research is the process of translation. In this paper, we focus on extracting cross-lingual issue news from the Twitter data of Chinese and Korean. We propose translation knowledge method for Wikipedia concepts as well as the Chinese and Korean cross-lingual inter-Wikipedia link relations. The relevance relations are extracted from the category and the page title of Wikipedia. The evaluation achieved a performance of 83% in average precision in the top 10 extracted issue news. The result indicates that our method is an effective for cross-lingual issue news detection.||0||0|
|Citing wikipedia: Don't do it-Wikipedians wouldn't||Rasberry L.||BMJ (Online)||English||[No abstract available]||0||0|
|Classification and indexing of complex digital objects with CIDOC CRM||Enge J.
|Archiving 2014 - Final Program and Proceedings||English||CIDOC-CRM provides an ontology-based description for the documentation of cultural heritage. Originally meant to support the documentation practice of cultural heritage institutions and to enable inter-institutional exchange, it defines a formal structure for the description of implicit and explicit relations between entities. In order to demonstrate the benefits of the model in a semantic web environment like "Semantic MediaWiki", the paper shows two practical examples. Both originate in the digital domain and are complex due to their nature: As an example of a completely synthetically generated HD-Video, "Sintel" (2010) by Colin Levy is gathered. Facing distributed internet-based art and culture, Olia Lialinas "Summer" (2013) is described. The examples demonstrate in what extent the semantic structure of the digital extension of CIDOC CRM, which is CRMdig, clarifies the objects nature (understanding) and thus supports the planning and documentation process of dedicated collections. For doing so, an own system, called CRM-Wiki was implemented.||0||0|
|Collaborative development for setup, execution, sharing and analytics of complex NMR experiments||Irvine A.G.
NMR experiment database
Pulse program optimisation
Spin dynamics analysis
|Journal of Magnetic Resonance||English||Factory settings of NMR pulse sequences are rarely ideal for every scenario in which they are utilised. The optimisation of NMR experiments has for many years been performed locally, with implementations often specific to an individual spectrometer. Furthermore, these optimised experiments are normally retained solely for the use of an individual laboratory, spectrometer or even single user. Here we introduce a web-based service that provides a database for the deposition, annotation and optimisation of NMR experiments. The application uses a Wiki environment to enable the collaborative development of pulse sequences. It also provides a flexible mechanism to automatically generate NMR experiments from deposited sequences. Multidimensional NMR experiments of proteins and other macromolecules consume significant resources, in terms of both spectrometer time and effort required to analyse the results. Systematic analysis of simulated experiments can enable optimal allocation of NMR resources for structural analysis of proteins. Our web-based application (http://nmrplus.org) provides all the necessary information, includes the auxiliaries (waveforms, decoupling sequences etc.), for analysis of experiments by accurate numerical simulation of multidimensional NMR experiments. The online database of the NMR experiments, together with a systematic evaluation of their sensitivity, provides a framework for selection of the most efficient pulse sequences. The development of such a framework provides a basis for the collaborative optimisation of pulse sequences by the NMR community, with the benefits of this collective effort being available to the whole community. © 2013 Elsevier Inc. All rights reserved.||0||0|
|Collaborative projects (social media application): About Wikipedia, the free encyclopedia||Kaplan A.
Wisdom of crowds
|Business Horizons||English||Collaborative projects-defined herein as social media applications that enable the joint and simultaneous creation of knowledge-related content by many end-users-have only recently received interest among a larger group of academics. This is surprising since applications such as wikis, social bookmarking sites, online forums, and review sites are probably the most democratic form of social media and reflect well the idea of user-generated content. The purpose of this article is to provide insight regarding collaborative projects; the concept of wisdom of crowds, an essential condition for their functioning; and the motivation of readers and contributors. Specifically, we provide advice on how firms can leverage collaborative projects as an essential element of their online presence to communicate both externally with stakeholders and internally among employees. We also discuss how to address situations in which negative information posted on collaborative projects can become a threat and PR crisis for firms.||0||0|
|Collaborative tools in the primary classroom: Teachers' thoughts on wikis||Agesilaou A.
|Lecture Notes in Computer Science||English||The purpose of this work-in-progress study is to examine the attitudes of primary school teachers in Cyprus on the use of wikis as a mean to promote collaborative learning in the classroom. A survey investigation was undertaken using 20 questionnaires and 3 semi-structured interviews. The survey results indicate a positive attitude of teachers in Cyprus to integrate wikis in primary education for the promotion of cooperation. As such collaborative learning activities among pupils are being encouraged.||0||0|
|Collective memory in Poland: A reflection in street names||Radoslaw Nielek
|Lecture Notes in Computer Science||English||Our article starts with an observation that street names fall into two general types: generic and historically inspired. We analyse street names distributions (of the second type) as a window to nation-level collective memory in Poland. The process of selecting street names is determined socially, as the selections reflect the symbols considered important to the nation-level society, but has strong historical motivations and determinants. In the article, we seek for these relationships in the available data sources. We use Wikipedia articles to match street names with their textual descriptions and assign them to the time points. We then apply selected text mining and statistical techniques to reach quantitative conclusions. We also present a case study: the geographical distribution of two particular street names in Poland to demonstrate the binding between history and political orientation of regions.||0||0|
|Comparative analysis of text representation methods using classification||Szymanski J.||Documents categorization
|Cybernetics and Systems||English||In our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article - evaluation of approaches to text representation for machine learning tasks - indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot be compensated for even by sophisticated machine learning algorithms. It confirms the thesis that proper data representation is a prerequisite for achieving high-quality results of data analysis. Evaluation of the text representations was performed within the Wikipedia repository by examination of classification parameters observed during automatic reconstruction of human-made categories. For that purpose, we use a classifier based on a support vector machines method, extended with multilabel and multiclass functionalities. During classifier construction we observed parameters such as learning time, representation size, and classification quality that allow us to draw conclusions about text representations. For the experiments presented in the article, we use data sets created from Wikipedia dumps. We describe our software, called Matrixu, which allows a user to build computational representations of Wikipedia articles. The software is the second contribution of our research, because it is a universal tool for converting Wikipedia from a human-readable form to a form that can be processed by a machine. Results generated using Matrixu can be used in a wide range of applications that involve usage of Wikipedia data.||0||0|
|Comparative evaluation of link-based approaches for candidate ranking in link-to-Wikipedia systems||Garcia N.F.
|Journal of Artificial Intelligence Research||English||In recent years, the task of automatically linking pieces of text (anchors) mentioned in a document to Wikipedia articles that represent the meaning of these anchors has received extensive research attention. Typically, link-to-Wikipedia systems try to find a set of Wikipedia articles that are candidates to represent the meaning of the anchor and, later, rank these candidates to select the most appropriate one. In this ranking process the systems rely on context information obtained from the document where the anchor is mentioned and/or from Wikipedia. In this paper we center our attention in the use of Wikipedia links as context information. In particular, we offer a review of several candidate ranking approaches in the state-of-the-art that rely on Wikipedia link information. In addition, we provide a comparative empirical evaluation of the different approaches on five different corpora: the TAC 2010 corpus and four corpora built from actual Wikipedia articles and news items. © 2014 AI Access Foundation. All rights reserved.||0||0|
|Comparing the pulses of categorical hot events in Twitter and Weibo||Shuai X.
|Click log mining
|HT 2014 - Proceedings of the 25th ACM Conference on Hypertext and Social Media||English||The fragility and interconnectivity of the planet argue compellingly for a greater understanding of how different communities make sense of their world. One of such critical demands relies on comparing the Chinese and the rest of the world (e.g., Americans), where communities' ideological and cultural backgrounds can be significantly different. While traditional studies aim to learn the similarities and differences between these communities via high-cost user studies, in this paper we propose a much more efficient method to compare different communities by utilizing social media. Specifically, Weibo and Twitter, the two largest microblogging systems, are employed to represent the target communities, i.e. China and the Western world (mainly United States), respectively. Meanwhile, through the analysis of the Wikipedia page-click log, we identify a set of categorical 'hot events' for one month in 2012 and search those hot events in Weibo and Twitter corpora along with timestamps via information retrieval methods. We further quantitatively and qualitatively compare users' responses to those events in Twitter and Weibo in terms of three aspects: popularity, temporal dynamic, and information diffusion. The comparative results show that although the popularity ranking of those events are very similar, the patterns of temporal dynamics and information diffusion can be quite different.||0||0|
|Computer-supported collaborative accounts of major depression: Digital rhetoric on Quora and Wikipedia||Rughinis C.
|Computer supported collaborative knowledge making
|Iberian Conference on Information Systems and Technologies, CISTI||English||We analyze digital rhetoric in two computer-supported collaborative settings of writing and learning, focusing on major depression: Wikipedia and Quora. We examine the procedural rhetoric of access to and interaction with information, and the textual rhetoric of individual and aggregated entries. Through their different organization of authorship, publication and reading, the two settings create divergent accounts of depression. Key points of difference include: focus on symptoms and causes vs. experiences and advice, use of lists vs. metaphors and narratives, a/temporal structure, and personal and relational knowledge.||0||0|
|Conceptual clustering||Boubacar A.
|Lecture Notes in Electrical Engineering||English||Traditional clustering methods are unable to describe the generated clusters. Conceptual clustering is an important and active research area that aims to efficiently cluster and explain the data. Previous conceptual clustering approaches provide descriptions that do not use a human comprehensible knowledge. This paper presents an algorithm which uses Wikipedia concepts to process a clustering method. The generated clusters overlap each other and serve as a basis for an information retrieval system. The method has been implemented in order to improve the performance of the system. It reduces the computation cost.||0||0|
|Continuous temporal Top-K query over versioned documents||Lan C.
|Lecture Notes in Computer Science||English||The management of versioned documents has attracted researchers' attentions in recent years. Based on the observation that decision-makers are often interested in finding the set of objects that have continuous behavior over time, we study the problem of continuous temporal top-k query. With a given a query, continuous temporal top-k search finds the documents that frequently rank in the top-k during a time period and take the weights of different time intervals into account. Existing works regarding querying versioned documents have focused on adding the constraint of time, however lacked to consider the continuous ranking of objects and weights of time intervals. We propose a new interval window-based method to address this problem. Our method can get the continuous temporal top-k results while using interval windows to support time and weight constraints simultaneously. We use data from Wikipedia to evaluate our method.||0||0|
|Counter narratives and controversial crimes: The Wikipedia article for the 'Murder of Meredith Kercher'||Page R.||Counter narratives
|Language and Literature||English||Narrative theorists have long recognised that narrative is a selective mode of representation. There is always more than one way to tell a story, which may alter according to its teller, audience and the social or historical context in which the story is told. But multiple versions of the 'same' events are not always valued in the same way: some versions may become established as dominant accounts, whilst others may be marginalised or resist hegemony as counter narratives (Bamberg and Andrews, 2004). This essay explores the potential of Wikipedia as a site for positioning counter and dominant narratives. Through the analysis of linearity and tellership (Ochs and Capps, 2001) as exemplified through revisions of a particular article ('Murder of Meredith Kercher'), I show how structural choices (open versus closed sequences) and tellership (single versus multiple narrators) function as mechanisms to prioritise different dominant narratives over time and across different cultural contexts. The case study points to the dynamic and relative nature of dominant and counter narratives. In the 'Murder of Meredith Kercher' article the counter narratives of the suspects' guilt or innocence and their position as villains or victims depended on national context, and changed over time. The changes in the macro-social narratives are charted in the micro-linguistic analysis of structure, citations and quoted speech in four selected versions of the article, taken from the English and Italian Wikipedias.||0||0|
|Creating a phrase similarity graph from wikipedia||Stanchev L.||Semantic search
|Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014||English||The paper addresses the problem of modeling the relationship between phrases in English using a similarity graph. The mathematical model stores data about the strength of the relationship between phrases expressed as a decimal number. Both structured data from Wikipedia, such as that the Wikipedia page with title 'Dog' belongs to the Wikipedia category 'Domesticated animals', and textual descriptions, such as that the Wikipedia page with title 'Dog' contains the word 'wolf' thirty one times are used in creating the graph. The quality of the graph data is validated by comparing the similarity of pairs of phrases using our software that uses the graph with results of studies that were performed with human subjects. To the best of our knowledge, our software produces better correlation with the results of both the Miller and Charles study and the WordSimilarity-353 study than any other published research.||0||0|
|Cross-language and cross-encyclopedia article linking using mixed-language topic model and hypernym translation||Wang Y.-C.
|English||Creating cross-language article links among different online encyclopedias is now an important task in the unification of multilingual knowledge bases. In this paper, we propose a cross-language article linking method using a mixed-language topic model and hypernym translation features based on an SVM model to link English Wikipedia and Chinese Baidu Baike, the most widely used Wiki-like encyclopedia in China. To evaluate our approach, we compile a data set from the top 500 Baidu Baike articles and their corresponding English Wiki articles. The evaluation results show that our approach achieves 80.95% in MRR and 87.46% in recall. Our method does not heavily depend on linguistic characteristics and can be easily extended to generate crosslanguage article links among different online encyclopedias in other languages.||0||0|
|Crowd-based appraisal and description of archival records at the State Archives Baden-Württemberg||Naumann K.
|Archiving 2014 - Final Program and Proceedings||English||Appraisal and description are core processes at historical archives. This article gives an account of innovative methodologies in this field using crowd-sourced information to (1st) identify which files are of interest for the public, (2nd) enable agency staff to extract and transfer exactly those files selected for permanent retention and (3rd) ease the description and cataloguing of the transferred objects. It defines the extent of outsourcing used at the State Archives (Landesarchiv Baden-Württemberg LABW), describes case studies and touches issues of change management. Data sources are government databases and geodatabases, commercial data on court decisions, the name tags of German Wikipedia, and bio-bibliographical metadata of the State Libraries and the German National Library.||0||0|
|Densification: Semantic document analysis using Wikipedia||Iustin Dornescu
|Natural Language Engineering||English||This paper proposes a new method for semantic document analysis: densification, which identifies and ranks Wikipedia pages relevant to a given document. Although there are similarities with established tasks such as wikification and entity linking, the method does not aim for strict disambiguation of named entity mentions. Instead, densification uses existing links to rank additional articles that are relevant to the document, a form of explicit semantic indexing that enables higher-level semantic retrieval procedures that can be beneficial for a wide range of NLP applications. Because a gold standard for densification evaluation does not exist, a study is carried out to investigate the level of agreement achievable by humans, which questions the feasibility of creating an annotated data set. As a result, a semi-supervised approach is employed to develop a two-stage densification system: filtering unlikely candidate links and then ranking the remaining links. In a first evaluation experiment, Wikipedia articles are used to automatically estimate the performance in terms of recall. Results show that the proposed densification approach outperforms several wikification systems. A second experiment measures the impact of integrating the links predicted by the densification system into a semantic question answering (QA) system that relies on Wikipedia links to answer complex questions. Densification enables the QA system to find twice as many additional answers than when using a state-of-the-art wikification system. Copyright||0||0|
|Designing a trust evaluation model for open-knowledge communities||Yang X.
|British Journal of Educational Technology||English||The openness of open-knowledge communities (OKCs) leads to concerns about the knowledge quality and reliability of such communities. This confidence crisis has become a major factor limiting the healthy development of OKCs. Earlier studies on trust evaluation for Wikipedia considered disadvantages such as inadequate influencing factors and separated the treatment of trustworthiness for users and resources. A new trust evaluation model for OKCs - the two-way interactive feedback model - is developed in this study. The model has two core components: resource trustworthiness (RT) and user trustworthiness (UT). The model is based on more interaction data, considers the interrelation between RT and UT, and better represents the features of interpersonal trust in reality. Experimental simulation and trial operation for the Learning Cell System, a novel open-knowledge community developed for ubiquitous learning, show that the model accurately evaluates RT and UT in this example OKC environment.||0||0|
|Designing information savvy societies: An introduction to assessability||Andrea Forte
|Conference on Human Factors in Computing Systems - Proceedings||English||This paper provides first steps toward an empirically grounded design vocabulary for assessable design as an HCI response to the global need for better information literacy skills. We present a framework for synthesizing literatures called the Interdisciplinary Literacy Framework and use it to highlight gaps in our understanding of information literacy that HCI as a field is particularly well suited to fill. We report on two studies that lay a foundation for developing guidelines for assessable information system design. The first is a study of Wikipedians', librarians', and laypersons' information assessment practices from which we derive two important features of assessable designs: Information provenance and stewardship. The second is an experimental study in which we operationalize these concepts in designs and test them using Amazon Mechanical Turk (MTurk).||0||0|
|Developing creativity competency of engineers||Waychal P.K.||Active learning
Index of learning styles (ils)
|ASEE Annual Conference and Exposition, Conference Proceedings||English||The complete agreement of all stakeholders on the importance of developing the creativity competency of engineering graduates motivated us to undertake this study. We chose a senior-level course in Software Testing and Quality Assurance which offered an excellent platform for the experiment as both testing and quality assurance activities can be executed using either routine or mechanical methods or highly creative ones. The earlier attempts reported in literature to develop the creativity competency do not appear to be systematic i.e. they do not follow the measurement ->action plan ->measurement cycle. The measurements, wherever done, are based on the Torrance Test of Critical Thinking (TTCT) and the Myers Briggs Type Indicator (MBTI). We found these tests costly and decided to search for an appropriate alternative that led us to the Felder Solomon Index of Learning Style (ILS). The Sensing / Intuition dimension of the ILS, like MBTI, is originated in Carl Jung's Theory of Psychological Types. Since a number of MBTI studies have used the dimension for assessing creativity, we posited that the same ILS dimension could be used to measure the competency. We carried out pre-ILS assessment, designed and delivered the course with a variety of activities that could potentially enhance creativity, and carried out course-end post-ILS assessment. Although major changes would not normally be expected after a one-semester course, a hypothesis in the study was that a shift from sensing toward intuition on learning style profiles would be observed, and indeed it was. A paired t- Test indicated that the pre-post change in the average sensing/intuition preference score was statistically significant (p = 0.004). While more research and direct assessment of competency is needed to be able to draw definitive conclusions about both the use of the instrument for measuring creativity and the efficacy of the course structure and contents in developing the competency, the results suggest that the approach is worth exploring.||0||0|
|Development of a semantic and syntactic model of natural language by means of non-negative matrix and tensor factorization||Anisimov A.
|Lecture Notes in Computer Science||English||A method for developing a structural model of natural language syntax and semantics is proposed. Syntactic and semantic relations between parts of a sentence are presented in the form of a recursive structure called a control space. Numerical characteristics of these data are stored in multidimensional arrays. After factorization, the arrays serve as the basis for the development of procedures for analyses of natural language semantics and syntax.||0||0|
|Editing beyond articles: Diversity & dynamics of teamwork in open collaborations||Morgan J.T.
David W. McDonald
|English||We report a study of Wikipedia in which we use a mixedmethods approach to understand how participation in specialized workgroups called WikiProjects has changed over the life of the encyclopedia. While previous work has analyzed the work of WikiProjects in supporting the development of articles within particular subject domains, the collaborative role of WikiProjects that do not fit this conventional mold has not been empirically examined. We combine content analysis, interviews and analysis of edit logs to identify and characterize these alternative WikiProjects and the work they do. Our findings suggest that WikiProject participation reflects community concerns and shifts in the community's conception of valued work over the past six years. We discuss implications for other open collaborations that need flexible, adaptable coordination mechanisms to support a range of content creation, curation and community maintenance tasks. Copyright||0||0|
|Effective integration of wiki for collaborative learning in higher education context||Yusop F.D.
Abdul Basar S.M.M.
|World Applied Sciences Journal||English||Wiki is an asynchronous online collaborative tool thatcan be adapted for teaching and learning purposes. This study attempts to explore and develop a further understanding of the factors influencing students' participation and commitments in collaboration using wiki in a higher education context. The usage of wiki to support class instruction will also be evaluated. Findings from online wiki observations were found to be positive in terms of students' participation in their wiki pages. Six factors have been identified as playing important roles in motivating and engaging students in collaborative writing practices via wiki. An interesting finding emerging from this study was that different motivation and engagement levels between undergraduate and postgraduate students were attributed to their roles as part-time students. The findings of this study will provide instructors an understanding of the elements that could either encourage or hinder students' motivation and participation in wiki activities.||0||0|
|Effectively detecting topic boundaries in a news video by using wikipedia||Kim J.W.
Topic boundary detection
|International Journal of Software Engineering and its Applications||English||With the development of internet technology, traditional TV news providers start sharing theirs news videos on the Web. As the number of TV news videos on the Web is constantly increasing, there is an impending need for effective mechanisms that are able to reduce the navigational overhead significantly with a given collection of TV news videos. Naturally, a TV news video contains a series of stories that are not related to each other, and thus building indexing structures based on the entire contents of it might be ineffective. An alternative and more promising strategy is to first find topic boundaries in a given news video based on topical coherence, and then build index structures for each coherent unit. Thus, the main goal of this paper is to develop an effective technique to detect topic boundaries of a given news video. The topic boundaries identified by our algorithm are then used to build indexing structures in order to support effective navigation guides and searches. The proposed method in this paper leverages Wikipedia to map the original contents of a news video from the keyword-space into the concept-space, and finds topic boundaries by using the contents represented in the concept-space. The experimental results show that the proposed technique provides significant precision gains in finding topic boundaries of a news video.||0||0|
|Elite size and resilience impact on global system structuration in social media||Matei S.A.
|Division of labor
|2014 International Conference on Collaboration Technologies and Systems, CTS 2014||English||The paper examines the role played by the most productive members of social media systems on leading the project and influencing the degree of project structuration. The paper focuses on findings of a large computational social science project that examines Wikipedia.1||0||0|
|Encoding document semantic into binary codes space||Yu Z.
|Lecture Notes in Computer Science||English||We develop a deep neural network model to encode document semantic into compact binary codes with the elegant property that semantically similar documents have similar embedding codes. The deep learning model is constructed with three stacked auto-encoders. The input of the lowest auto-encoder is the representation of word-count vector of a document, while the learned hidden features of the deepest auto-encoder are thresholded to be binary codes to represent the document semantic. Retrieving similar document is very efficient by simply returning the documents whose codes have small Hamming distances to that of the query document. We illustrate the effectiveness of our model on two public real datasets - 20NewsGroup and Wikipedia, and the experiments demonstrate that the compact binary codes sufficiently embed the semantic of documents and bring improvement in retrieval accuracy.||0||0|
|Engaging with a wiki related to knowledge translation: A survey of whatiskt wiki users||Mathew D.
|Journal of Medical Internet Research||English||Background: In 2008, WhatisKT wiki was launched as a collaborative platform for knowledge translation (KT) researchers and stakeholders to debate the use and definitions of KT-related terms. The wiki has definitions for over 110 terms from disciplines including health care, information technology, education, accounting, and business. WhatisKT wiki has over 115 registered users. Approximately 73,000 unique visitors have visited the wiki since 2008. Despite annual increases in visitors and regular maintenance of the wiki, no visitors have contributed content or started a discussion. Objective: We surveyed wiki users to gain an understanding of the perceived value of the website, reasons for not engaging in the wiki, and suggestions to facilitate collaboration and improve the usability of the wiki. Methods: We surveyed three cohorts: KT Canada members who were previously invited to join the wiki, registered wiki members, and unregistered visitors. The first two cohorts completed a Web-based survey that included the System Usability Scale (SUS) questionnaire to assess usability; additionally 3 participants were interviewed. Unregistered wiki visitors were surveyed with polls posted on the wiki. The study received ethics approval from the McMaster University Faculty of Health Sciences Research Ethics Board. Results: Twenty-three participants completed the Web-based and SUS surveys; 15 participants indicated that they would collaborate on the wiki. The mean SUS score of 67 (95% CI 56-77) indicated that the wiki could be considered for design improvements. Study participants indicated that the wiki could be improved by email notification regarding new terms, better grouping of terms, user friendly interface, and training for users interested in editing content. Conclusions: The findings from this survey will be used to enhance the design and content of WhatisKT wiki. Further feedback from participants will be used to make the wiki an ideal collaboration platform for KT researchers interested in terminology.||0||0|
|Entity ranking based on Wikipedia for related entity finding||Jinghua Zhang
Related entity finding
|Jisuanji Yanjiu yu Fazhan/Computer Research and Development||Chinese||Entity ranking is a very important step for related entity finding (REF). Although researchers have done many works about "entity ranking based on Wikipedia for REF", there still exists some issues: the semi-automatic acquirement of target-type, the coarse-grained target-type, the binary judgment of entity-type relevancy and ignoring the effects of stop words in calculation of entity-relation relevancy. This paper designs a framework, which ranks entities through the calculation of a triple-combination (including entity relevancy, entity-type relevancy and entity-relation relevancy) and acquires the best combination-method through the comparisons of experimental results. A novel approach is proposed to calculate the entity-type relevancy. It can automatically acquire the fine-grained target-type and the discriminative rules of its hyponym Wikipedia-categories through inductive learning, and calculate entity-type relevancy through counting the number of categories which meet the discriminative rules. Also, this paper proposes a "cut stop words to rebuild relation" approach to calculate the entity-relation relevancy between candidate entity and source entity. Experiment results demonstrate that the proposed approaches can effectively improve the entity-ranking results and reduce the time consumed in calculating.||0||0|
|Entity recognition in information extraction||Hanafiah N.
|Lecture Notes in Computer Science||English||Detecting and resolving entities is an important step in information retrieval applications. Humans are able to recognize entities by context, but information extraction systems (IES) need to apply sophisticated algorithms to recognize an entity. The development and implementation of an entity recognition algorithm is described in this paper. The implemented system is integrated with an IES that derives triples from unstructured text. By doing so, the triples are more valuable in query answering because they refer to identified entities. By extracting the information from Wikipedia encyclopedia, a dictionary of entities and their contexts is built. The entity recognition computes a score for context similarity which is based on cosine similarity with a tf-idf weighting scheme and the string similarity. The implemented system shows a good accuracy on Wikipedia articles, is domain independent, and recognizes entities of arbitrary types.||0||0|
|Evaluating the helpfulness of linked entities to readers||Yamada I.
|HT 2014 - Proceedings of the 25th ACM Conference on Hypertext and Social Media||English||When we encounter an interesting entity (e.g., a person's name or a geographic location) while reading text, we typically search and retrieve relevant information about it. Entity linking (EL) is the task of linking entities in a text to the corresponding entries in a knowledge base, such as Wikipedia. Recently, EL has received considerable attention. EL can be used to enhance a user's text reading experience by streamlining the process of retrieving information on entities. Several EL methods have been proposed, though they tend to extract all of the entities in a document including unnecessary ones for users. Excessive linking of entities can be distracting and degrade the user experience. In this paper, we propose a new method for evaluating the helpfulness of linking entities to users. We address this task using supervised machine-learning with a broad set of features. Experimental results show that our method significantly outperforms baseline methods by approximately 5.7%-12% F1. In addition, we propose an application, Linkify, which enables developers to integrate EL easily into their web sites.||0||0|
|Evaluating the use of wikis for EFL: A case study of an undergraduate English writing course in China||Sun Z.
English as a foreign language
English for specific purposes
|International Journal of Information Technology and Management||English||This study aims at examining the effectiveness of applying wikis in tertiary-level English as a foreign language (EFL) classes. The use of wikis in English for specific purposes (ESP) course-Business English Writing-by undergraduate students in a university in China was investigated through data analysis of test results as well as interviews. Performance results on Business English Certificates (BEC) preliminary (pre-test) and BEC vantage (post-test) revealed that the experimental group significantly outperformed the control group in writing in the post-test. Interviews showed that students held a rather positive view towards the use of wikis in the ESP writing class and they favoured the tool mainly for its effect of enhancing their learning motivation and its function for collaborative learning. Still, a few students were not accustomed to this type of e-learning for varied reasons. Implications of the results are that wikis can benefit EFL learners by improving their writing skills in a collaborative environment. Copyright||0||0|
|Evaluation of gastroenterology and hepatology articles on Wikipedia: Are they suitable as learning resources for medical students?||Samy A. Azer||Gastroenterology
|(Eur J Gastroenterol Hepatol. 2014 Feb;26(2):155-63) doi:10.1097/MEG.0000000000000003||BACKGROUND: With the changes introduced to medical curricula, medical students use learning resources on the Internet such as Wikipedia. However, the credibility of the medical content of Wikipedia has been questioned and there is no evidence to respond to these concerns. The aim of this paper was to critically evaluate the accuracy and reliability of the gastroenterology and hepatology information that medical students retrieve from Wikipedia. METHODS: The Wikipedia website was searched for articles on gastroenterology and hepatology on 28 May 2013. Copies of these articles were evaluated by three assessors independently using an appraisal form modified from the DISCERN instrument. The articles were scored for accuracy of content, readability, frequency of updating, and quality of references. RESULTS: A total of 39 articles were evaluated. Although the articles appeared to be well cited and reviewed regularly, several problems were identified with regard to depth of discussion of mechanisms and pathogenesis of diseases, as well as poor elaboration on different investigations. Analysis of the content showed a score ranging from 15.6±0.6 to 43.6±3.2 (mean±SD). The total number of references in all articles was 1233, and the number of references varied from 4 to 144 (mean±SD, 31.6±27.3). The number of citations from peer-reviewed journals published in the last 5 years was 242 (28%); however, several problems were identified in the list of references and citations made. The readability of articles was in the range of -8.0±55.7 to 44.4±1.4; for all articles the readability was 26±9.0 (mean±SD). The concordance between the assessors on applying the criteria had mean κ scores in the range of 0.61 to 0.79. CONCLUSION: Wikipedia is not a reliable source of information for medical students searching for gastroenterology and hepatology articles. Several limitations, deficiencies, and scientific errors have been identified in the articles examined.||0||0|
|Experimental Implementation of a M2M System Controlled by a Wiki Network||Takashi Yamanoue
|Applied Computing and Information Technology, Studies in Computational Intelligence||English||Experimental implementation of a M2M system, which is controlled by a wiki network, is discussed. This M2M system consists of mobile terminals at remote places and wiki servers on the Internet. A mobile terminal of the system consists of an Android terminal and it may have an Arduino board with sensors and actuators. The mobile terminal can read data from not only the sensors in the Arduino board but also wiki pages of the wiki servers. The mobile terminal can control the actuators of the Arduino board or can write sensor data to a wiki page. The mobile terminal performs such reading writing and controlling by reading and executing commands on a wiki page, and by reading and running a program on the wiki page, periodically. In order to run the program, the mobile terminal equipped with a data processor. After placing mobile terminals at remote places, the group of users of this system can control the M2M system by writing and updating such commands and programs of the wiki network without going to the places of the mobile terminals. This system realizes an open communication forum for not only people but also for machines .||3||0|
|Experimental comparison of semantic word clouds||Barth L.
|Lecture Notes in Computer Science||English||We study the problem of computing semantics-preserving word clouds in which semantically related words are close to each other. We implement three earlier algorithms for creating word clouds and three new ones. We define several metrics for quantitative evaluation of the resulting layouts. Then the algorithms are compared according to these metrics, using two data sets of documents from Wikipedia and research papers. We show that two of our new algorithms outperform all the others by placing many more pairs of related words so that their bounding boxes are adjacent. Moreover, this improvement is not achieved at the expense of significantly worsened measurements for the other metrics.||0||0|
|Explaining authors' contribution to pivotal artifacts during mass collaboration in the Wikipedia's knowledge base||Iassen Halatchliyski
|International Journal of Computer-Supported Collaborative Learning||English||This article discusses the relevance of large-scale mass collaboration for computer-supported collaborative learning (CSCL) research, adhering to a theoretical perspective that views collective knowledge both as substance and as participatory activity. In an empirical study using the German Wikipedia as a data source, we explored collective knowledge as manifested in the structure of artifacts that were created through the collaborative activity of authors with different levels of contribution experience. Wikipedia's interconnected articles were considered at the macro level as a network and analyzed using a network analysis approach. The focus of this investigation was the relation between the authors' experience and their contribution to two types of articles: central pivotal articles within the artifact network of a single knowledge domain and boundary-crossing pivotal articles within the artifact network of two adjacent knowledge domains. Both types of pivotal articles were identified by measuring the network position of artifacts based on network analysis indices of topological centrality. The results showed that authors with specialized contribution experience in one domain predominantly contributed to central pivotal articles within that domain. Authors with generalized contribution experience in two domains predominantly contributed to boundary-crossing pivotal articles between the knowledge domains. Moreover, article experience (i.e., the number of articles in both domains an author had contributed to) was positively related to the contribution to both types of pivotal articles, regardless of whether an author had specialized or generalized domain experience. We discuss the implications of our findings for future studies in the field of CSCL. © 2013 International Society of the Learning Sciences, Inc. and Springer Science+Business Media New York.||0||0|
|Exploiting Twitter and Wikipedia for the annotation of event images||McParlane P.J.
|English||With the rise in popularity of smart phones, there has been a recent increase in the number of images taken at large social (e.g. festivals) and world (e.g. natural disasters) events which are uploaded to image sharing websites such as Flickr. As with all online images, they are often poorly annotated, resulting in a difficult retrieval scenario. To overcome this problem, many photo tag recommendation methods have been introduced, however, these methods all rely on historical Flickr data which is often problematic for a number of reasons, including the time lag problem (i.e. in our collection, users upload images on average 50 days after taking them, meaning "training data" is often out of date). In this paper, we develop an image annotation model which exploits textual content from related Twitter and Wikipedia data which aims to overcome the discussed problems. The results of our experiments show and highlight the merits of exploiting social media data for annotating event images, where we are able to achieve recommendation accuracy comparable with a state-of-the-art model. Copyright 2014 ACM.||0||0|
|Exploiting Wikipedia for Evaluating Semantic Relatedness Mechanisms||Ferrara F.
|Communications in Computer and Information Science||English||The semantic relatedness between two concepts is a measure that quantifies the extent to which two concepts are semantically related. In the area of digital libraries, several mechanisms based on semantic relatedness methods have been proposed. Visualization interfaces, information extraction mechanisms, and classification approaches are just some examples of mechanisms where semantic relatedness methods can play a significant role and were successfully integrated. Due to the growing interest of researchers in areas like Digital Libraries, Semantic Web, Information Retrieval, and NLP, various approaches have been proposed for automatically computing the semantic relatedness. However, despite the growing number of proposed approaches, there are still significant criticalities in evaluating the results returned by different methods. The limitations evaluation mechanisms prevent an effective evaluation and several works in the literature emphasize that the exploited approaches are rather inconsistent. In order to overcome this limitation, we propose a new evaluation methodology where people provide feedback about the semantic relatedness between concepts explicitly defined in digital encyclopedias. In this paper, we specifically exploit Wikipedia for generating a reliable dataset.||0||0|
|Exploiting the wisdom of the crowds for characterizing and connecting heterogeneous resources||Kawase R.
Pereira Nunes B.
|HT 2014 - Proceedings of the 25th ACM Conference on Hypertext and Social Media||English||Heterogeneous content is an inherent problem for cross-system search, recommendation and personalization. In this paper we investigate differences in topic coverage and the impact of topics in different kinds of Web services. We use entity extraction and categorization to create fingerprints that allow for meaningful comparison. As a basis taxonomy, we use the 23 main categories of Wikipedia Category Graph, which has been assembled over the years by the wisdom of the crowds. Following a proof of concept of our approach, we analyze differences in topic coverage and topic impact. The results show many differences between Web services like Twitter, Flickr and Delicious, which reflect users' behavior and the usage of each system. The paper concludes with a user study that demonstrates the benefits of fingerprints over traditional textual methods for recommendations of heterogeneous resources.||0||0|
|Exploratory search with semantic transformations using collaborative knowledge bases||Yegin Genc||Collaborative knowledge bases
|WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining||English||Sometimes we search for simple facts. Other times we search for relationships between concepts. While existing information retrieval systems work well for simple searches, they are less satisfying for complex inquiries because of the ill-structured nature of many searches and the cognitive load involved in the search process. Search can be improved by leveraging the network of concepts that are maintained by collaborative knowledge bases such as Wikipedia. By treating exploratory search inquires as networks of concepts - and then mapping documents to these concepts, exploratory search performance can be improved. This method is applied to an exploratory search task: given a journal abstract, abstracts are ranked based their relevancy to the seed abstract. The results show comparable relevancy scores to state of the art techniques while at the same time providing better diversity.||0||0|
|Exploring collective DSL integration in a large situated IS: Towards comprehensive language integration in information systems||Aram M.
Semantic enterprise wiki
|ACM International Conference Proceeding Series||English||In large situated information system instances, a great variety of stakeholders interact with each other via technology, constantly shaping and refining the information system. In the course of such a system's history, a range of domain-specific languages may have been incorporated. These language means are often not sufficiently integrated on an ontological level leading to syntactical and conceptual redundancies and impeding a shared understanding of the systems' functionalities. In this paper, we present our ambitions towards a language integration approach that aims at mitigating this problem. We exemplify it in the context of an existing educational information system instance.||0||0|
|Exploring online social behavior in crowdsourcing communities: A relationship management perspective||Shen X.-L.
Plural subject theory
|Computers in Human Behavior||English||With the popularity of social media, crowdsourcing innovation provides new ways to generate original and useful content. It offers a unique opportunity for online crowds to communicate and collaborate on a variety of topics of mutual interest. This study presents an initial attempt to explore and understand online social behavior in crowdsourcing communities, with the insights from both plural subject theory and commitment-trust theory. In particular, two different types of collective intention (i.e., we-mode collective intention, which refers to acting as a group member, and I-mode collective intention, which refers to acting interdependently to contribute to the group goal) were proposed. The research model was empirically examined with longitudinal data collected from 202 wiki users. Findings indicated that, although both I-mode and we-mode collective intentions significantly predicted online social behavior in wiki communities, we-mode collective intention exerted a greater effect on users' behavior. In addition, relationship-orientated factors (e.g., trust and commitment) only affected we-mode, instead of I-mode, collective intention. This study finally yields several implications for both research and practice. © 2014 Elsevier Ltd. All rights reserved.||0||0|
|Exploring the links between pre-service teachers' beliefs and video-based reflection in wikis||Cho Y.H.
|Computers in Human Behavior||English||In teacher education, video has been used frequently for the development of competencies for effective teaching. However, few empirical studies have investigated reciprocal relationships between pre-service teachers' beliefs and video-based reflection activities. The present study investigated the influences of epistemological beliefs about mathematics on video-based reflection in wikis. Elementary school pre-service teachers had carried out reflective writing and questioning activities after watching a video clip about mathematics learning or instruction in wikis for six weeks. This study also explored the relationships between video-based reflection activities and the change of mathematical beliefs for teaching (MBT). Both quantitative and qualitative data were collected to examine the links between beliefs and reflection activities. This study found that epistemological beliefs partially influenced reflective writing and questioning activities in wikis. In addition, video-based reflection activities were beneficial for the beliefs of mathematical knowledge and students. This study also identified a few reflection and question categories that were closely related to the change of MBT. Lastly, implications of this study were discussed in regard to video-based reflection practices in teacher education. © 2014 Elsevier Ltd. All rights reserved.||0||0|
|Extended cognition and the explosion of knowledge||Ludwig D.||Active Externalism
Cognitive Niche Construction
|Philosophical Psychology||English||The aim of this article is to show that externalist accounts of cognition such as Clark and Chalmers' (1998) "active externalism" lead to an explosion of knowledge that is caused by online resources such as Wikipedia and Google. I argue that externalist accounts of cognition imply that subjects who integrate mobile Internet access in their cognitive routines have millions of standing beliefs on unexpected issues such as the birth dates of Moroccan politicians or the geographical coordinates of villages in southern Indonesia. Although many externalists propose criteria for the bounds of cognition that are designed to avoid this explosion of knowledge, I argue that these criteria are flawed and that active externalism has to accept that information resources such as Wikipedia and Google constitute extended cognitive processes.||0||0|
|Extracting Ontologies from Arabic Wikipedia: A Linguistic Approach||Al-Rajebah N.I.
Semantic field theory
|Arabian Journal for Science and Engineering||English||As one of the important aspects of semantic web, building ontological models became a driving demand for developing a variety of semantic web applications. Through the years, much research was conducted to investigate the process of generating ontologies automatically from semi-structured knowledge sources such as Wikipedia. Different ontology building techniques were investigated, e.g., NLP tools and pattern matching, infoboxes and structured knowledge sources (Cyc and WordNet). Looking at the results of previous approaches we can see that the vast majority of employed techniques did not consider the linguistic aspect of Wikipedia. In this article, we present our solution to extract ontologies from Wikipedia using a linguistic approach based on the semantic field theory introduced by Jost Trier. Linguistic ontologies are significant in many applications for both linguists and Web researchers. We applied the proposed approach on the Arabic version of Wikipedia. The semantic relations were extracted from infoboxes, hyperlinks within infoboxes and list of categories that articles belong to. Our system successfully extracted approximately (760,000) triples from the Arabic Wikipedia. We conducted three experiments to evaluate the system output, namely: Validation Test, Crowd Evaluation and Domain Experts' evaluation. The system output achieved an average precision of 65 %.||0||0|
|Extracting and displaying temporal and geospatial entities from articles on historical events||Chasin R.
|Geospatial entity extraction
Natural Language Processing
|Computer Journal||English||This paper discusses a system that extracts and displays temporal and geospatial entities in text. The first task involves identification of all events in a document followed by identification of important events using a classifier. The second task involves identifying named entities associated with the document. In particular, we extract geospatial named entities. We disambiguate the set of geospatial named entities and geocode them to determine the correct coordinates for each place name, often called grounding. We resolve ambiguity based on sentence and article context. Finally, we present a user with the key events and their associated people, places and organizations within a document in terms of a timeline and a map. For purposes of testing, we use Wikipedia articles about historical events, such as those describing wars, battles and invasions. We focus on extracting major events from the articles, although our ideas and tools can be easily used with articles from other sources such as news articles. We use several existing tools such as Evita, Google Maps, publicly available implementations of Support Vector Machines, Hidden Markov Model and Conditional Random Field, and the MIT SIMILE Timeline.||0||0|
|Extracting semantic concept relations from Wikipedia||Arnold P.
Natural Language Processing
|ACM International Conference Proceeding Series||English||Background knowledge as provided by repositories such as WordNet is of critical importance for linking or mapping ontologies and related tasks. Since current repositories are quite limited in their scope and currentness, we investigate how to automatically build up improved repositories by extracting semantic relations (e.g., is-a and part-of relations) from Wikipedia articles. Our approach uses a comprehensive set of semantic patterns, finite state machines and NLP-techniques to process Wikipedia definitions and to identify semantic relations between concepts. Our approach is able to extract multiple relations from a single Wikipedia article. An evaluation for different domains shows the high quality and effectiveness of the proposed approach.||0||0|
|Facilitating student engagement and collaboration in a large postgraduate course using wiki-based activities||Salaber J.||Collaborative learning
|International Journal of Management Education||English||This paper investigates the impact of wiki-based activities on student participation and collaborative learning in a large postgraduate international management course. The wiki was used in this study as a facilitator for engagement and collaboration rather than a means of online discussions. Based on both qualitative and quantitative data, we find strong evidence that the use of the wiki facilitated student engagement and collaboration, both inside and outside the classroom. Moreover, student learning had significantly improved as a result of the enhanced learning environment.||0||0|
|Fidarsi di Wikipedia||Simone Dezaiacomo||Wikipedia
Teoria delle decisioni e processi cognitivi
|Italian||Lo scopo dello studio è comprendere i fenomeni alla base della fiducia degli utenti verso l'enciclopedia online Wikipedia. Per farlo è necessario prima di tutto comprendere e modellizzare l'organizzazione della struttura dei processi socio-produttivi sottostanti alla produzione del contenuto di Wikipedia, procedendo quindi nel verificare empiricamente e descrivere le capacità di autocorrezione della stessa. Oltre a quelli utilizzati in questo studio, saranno anche descritti gli approcci e i risultati trattati in letteratura, riportando i principali studi che nel corso degli anni hanno affrontato questi argomenti, sebbene mantenendoli indipendenti.
Per comprendere la struttura della community degli editor di Wikipedia, si è ipotizzata l'esistenza di un modello di tipo Core-Periphery. Per studiare il modello sono state eseguite delle analisi su dati derivanti da un campione di pagine della versione italiana di Wikipedia. I risultati ottenuti dall'analisi di queste informazioni rappresentano le basi utilizzate per la selezione delle pagine oggetto dell'iniezione degli errori, costituendo un metodo per stimare le diverse probabilità di autocorrezione per ciascuna pagina. Per quanto riguarda le capacità di resilienza di Wikipedia, i risultati sono ricavati utilizzando un approccio empirico. Questo consiste nell'inserimento di errori all'interno del campione di pagine sotto specifici vincoli metodologici per poi valutare in quanto tempo e con quali modalità questi errori vengono corretti.
E' stata effettuata un'analisi specifica per la scelta delle tipologie di errore e delle variabili da considerare nell'inserimento di questi.Questa analisi ha portato alla definizione di 2 esperimenti tra loro distinti, i cui risultati portano ad interessanti conclusioni sia visti separatamente che combinati tra loro. Sulla base dei risultati di questi esperimenti è stato possibile discutere sulle capacità di autocorrezione del sistema, elemento chiave nello studio delle dinamiche della fiducia verso Wikipedia.
|Fostering collaborative learning with wikis: Extending MediaWiki with educational features||Popescu E.
|Lecture Notes in Computer Science||English||Wikis are increasingly popular Web 2.0 tools in educational settings, being used successfully for collaborative learning. However, since they were not originally conceived as educational tools, they lack some of the functionalities useful in the instructional process (such as learner monitoring, evaluation support, student group management etc.). Therefore in this paper we propose a solution to add these educational support features, as an extension to the popular MediaWiki platform. CoLearn, as it is called, is aimed at increasing the collaboration level between students, investigating also the collaborative versus cooperative learner actions. Its functionalities and pedagogical rationale are presented, together with some technical details. A set of practical guidelines for promoting collaborative learning with wikis is also included.||0||0|
|From open-source software to Wikipedia: 'Backgrounding' trust by collective monitoring and reputation tracking||De Laat P.B.||Robot
|Ethics and Information Technology||English||Open-content communities that focus on co-creation without requirements for entry have to face the issue of institutional trust in contributors. This research investigates the various ways in which these communities manage this issue. It is shown that communities of open-source software-continue to-rely mainly on hierarchy (reserving write-access for higher echelons), which substitutes (the need for) trust. Encyclopedic communities, though, largely avoid this solution. In the particular case of Wikipedia, which is confronted with persistent vandalism, another arrangement has been pioneered instead. Trust (i.e. full write-access) is 'backgrounded' by means of a permanent mobilization of Wikipedians to monitor incoming edits. Computational approaches have been developed for the purpose, yielding both sophisticated monitoring tools that are used by human patrollers, and bots that operate autonomously. Measures of reputation are also under investigation within Wikipedia; their incorporation in monitoring efforts, as an indicator of the trustworthiness of editors, is envisaged. These collective monitoring efforts are interpreted as focusing on avoiding possible damage being inflicted on Wikipedian spaces, thereby being allowed to keep the discretionary powers of editing intact for all users. Further, the essential differences between backgrounding and substituting trust are elaborated. Finally it is argued that the Wikipedian monitoring of new edits, especially by its heavy reliance on computational tools, raises a number of moral questions that need to be answered urgently.||0||0|
|Fuzzy ontology alignment using background knowledge||Todorov K.
|(fuzzy) ontology alignment
|International Journal of Uncertainty, Fuzziness and Knowlege-Based Systems||English||We propose an ontology alignment framework with two core features: the use of background knowledge and the ability to handle vagueness in the matching process and the resulting concept alignments. The procedure is based on the use of a generic reference vocabulary, which is used for fuzzifying the ontologies to be matched. The choice of this vocabulary is problem-dependent in general, although Wikipedia represents a general-purpose source of knowledge that can be used in many cases, and even allows cross language matchings. In the first step of our approach, each domain concept is represented as a fuzzy set of reference concepts. In the next step, the fuzzified domain concepts are matched to one another, resulting in fuzzy descriptions of the matches of the original concepts. Based on these concept matches, we propose an algorithm that produces a merged fuzzy ontology that captures what is common to the source ontologies. The paper describes experiments in the domain of multimedia by using ontologies containing tagged images, as well as an evaluation of the approach in an information retrieval setting. The undertaken fuzzy approach has been compared to a classical crisp alignment by the help of a ground truth that was created based on human judgment.||0||0|
|Graph-based domain-specific semantic relatedness from Wikipedia||Sajadi A.||Biomedical Domain
|Lecture Notes in Computer Science||English||Human made ontologies and lexicons are promising resources for many text mining tasks in domain specific applications, but they do not exist for most domains. We study the suitability of Wikipedia as an alternative resource for ontologies regarding the Semantic Relatedness problem. We focus on the biomedical domain because (1) high quality manually curated ontologies are available and (2) successful graph based methods have been proposed for semantic relatedness in this domain. Because Wikipedia is not hierarchical and links do not convey defined semantic relationships, the same methods used on lexical resources (such as WordNet) cannot be applied here straightforwardly. Our contributions are (1) Demonstrating that Wikipedia based methods outperform state of the art ontology based methods on most of the existing ontologies in the biomedical domain (2) Adapting and evaluating the effectiveness of a group of bibliometric methods of various degrees of sophistication on Wikipedia for the first time (3) Proposing a new graph-based method that is outperforming existing methods by considering some specific features of Wikipedia structure.||0||0|
|Heterogeneous graph-based intent learning with queries, web pages and Wikipedia concepts||Ren X.
|Heterogeneous graph clustering
|WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining||English||The problem of learning user search intents has attracted intensive attention from both industry and academia. However, state-of-the-art intent learning algorithms suffer from different drawbacks when only using a single type of data source. For example, query text has difficulty in distinguishing ambiguous queries; search log is bias to the order of search results and users' noisy click behaviors. In this work, we for the first time leverage three types of objects, namely queries, web pages and Wikipedia concepts collaboratively for learning generic search intents and construct a heterogeneous graph to represent multiple types of relationships between them. A novel unsupervised method called heterogeneous graph-based soft-clustering is developed to derive an intent indicator for each object based on the constructed heterogeneous graph. With the proposed co-clustering method, one can enhance the quality of intent understanding by taking advantage of different types of data, which complement each other, and make the implicit intents easier to interpret with explicit knowledge from Wikipedia concepts. Experiments on two real-world datasets demonstrate the power of the proposed method where it achieves a 9.25% improvement in terms of NDCG on search ranking task and a 4.67% enhancement in terms of Rand index on object co-clustering task compared to the best state-of-the-art method.||0||0|
|How collective intelligence emerges: Knowledge creation process in Wikipedia from microscopic viewpoint||Kangpyo Lee||Collective intelligence
|Proceedings of the Workshop on Advanced Visual Interfaces AVI||English||The Wikipedia, one of the richest human knowledge repositories on the Internet, has been developed by collective intelligence. To gain insight into Wikipedia, one asks how initial ideas emerge and develop to become a concrete article through the online collaborative process? Led by this question, the author performed a microscopic observation of the knowledge creation process on the recent article, "Fukushima Daiichi nuclear disaster." The author collected not only the revision history of the article but also investigated interactions between collaborators by making a user-paragraph network to reveal an intellectual intervention of multiple authors. The knowledge creation process on the Wikipedia article was categorized into 4 major steps and 6 phases from the beginning to the intellectual balance point where only revisions were made. To represent this phenomenon, the author developed a visaphor (digital visual metaphor) to digitally represent the article's evolving concepts and characteristics. Then the author created a dynamic digital information visualization using particle effects and network graph structures. The visaphor reveals the interaction between users and their collaborative efforts as they created and revised paragraphs and debated aspects of the article.||0||0|
|ICHPEDIA, a case study in community engagement in the safeguarding of ICH online||Park S.C.||Collective intelligence
Korea's safeguarding policy
|International Journal of Intangible Heritage||English||This article presents a new paradigm of safeguarding methods through digital platforms and technology. Since 2010, a group of researchers in Korea have been developing a new experimental methodology of inventorying intangible cultural heritage (ICH) utilising a new concept of collective intelligence and advanced information technologies. The research team established Ichpedia, a web-based ICH encyclopedia and archive. The purpose of Ichpedia is four fold. First, it functions as the most efficient digital ICH inventorying system available using modern information technologies. It is possible to record and retain the dynamic features of ICH through the use of multi-media resources. Secondly, Ichpedia can facilitate interactivity between information providers and users so that ICH communities, groups and individuals can directly access the system as information providers or editors. Using the functions of the system, their voices can easily be disseminated to the public. Such cooperative work will encourage more awareness and identification of the fragility of ICH. Ichpedia will therefore be instrumental in improving the understanding of ICH communities and individuals and finding better safeguarding methods. Thirdly, Ichpedia can reduce the economic burden of establishing a highly efficient database system. It is easy and simple to use but offers high efficiency compared to other web-based ICH encyclopedias worldwide. Ichpedia has the advantage of being the least expensive option for the development and maintenance of such a system. Lastly, it is hoped that Ichpedia will pave the way for digital innovation in the area of ICH recording with the free and open distribution of the digital platform and technologies.||0||0|
|IScience: A computer-supported collaborative inquiry learning project for science students in secondary and tertiary science education||Anderson K.
|Gifted and talented
Pre-service teacher training
|International Journal of Innovation in Science and Mathematics Education||English||Pre-service teachers come to teacher education programs with a range of experiences and understandings about inquiry in Science. The IScience project aims to assist pre-service teachers to develop an understanding of the issues and skills required to guide students through an open-inquiry process. In addition, the project provides opportunities for pre-service teachers at the beginning of their teacher training to develop their skills in mentoring high school science students in an open-ended inquiry process. In this study, wikis were used to support the interactions among the pre-service teachers and school students, who were from geographically diverse locations to collaborate on the open-inquiry project. A mixed-method approach to the data collection was used. Data sources included surveys, reflective journals, and pre-and post-tests. The impact of the project on the pre-service teachers' understanding of how to teach science by inquiry is discussed in this paper. The results of the study indicate the pre-service teachers felt more confident in their understandings of scientific inquiry and their ability to teach inquiry.||0||0|
|Identifying the topic of queries based on domain specify ontology||ChienTa D.C.
|WIT Transactions on Information and Communication Technologies||English||In order to identify the topic of queries, a large number of past researches have relied on lexicon-syntactic and handcrafted knowledge sources in Machine Learning and Natural Language Processing (NLP). Conversely, in this paper, we introduce the application system that detects the topic of queries based on domain-specific ontology. On this system, we work hard on building this domainspecific ontology, which is composed of instances automatically extracted from available resources such as Wikipedia, WordNet, and ACM Digital Library. The experimental evaluation with many cases of queries related to information technology area shows that this system considerably outperforms a matching and identifying approach.||0||0|
|Implementing Web 2.0 tools in organisations||Baxter G.J.||Enterprise 2.0
Web 2.0 implementation model
|Learning Organization||English||Purpose: This special issue aims to increase the awareness of the organisational factors that enterprises must reflect on and address when introducing Web 2.0 technologies into their organisations. In contrast to empirical studies that review the impact of Web 2.0 technologies in organisations in terms of how they might support knowledge sharing or communities of practice, this special issue intends to identify the salient criteria that management practitioners must address to assist in the implementation of Web 2.0 technologies in the work place. Design/methodology/approach: This special issue aims to increase the awareness of the organisational factors that enterprises must reflect on and address when introducing Web 2.0 technologies into their organisations. In contrast to empirical studies that review the impact of Web 2.0 technologies in organisations in terms of how they might support knowledge sharing or communities of practice, this special issue intends to identify the salient criteria that management practitioners must address to assist in the implementation of Web 2.0 technologies in the work place. Findings: One of the principal findings that have emerged from this special issue is that it indicates the importance of reviewing social and cultural factors in organisations when introducing Web 2.0 technologies in the work place. In addition to assessing technical issues that might impact on the implementation of Web 2.0 technologies in organisations this special issue also explores subject matters such as the dilemma of whether a top-down or a bottom-up approach is more effective towards engaging staff in the adoption of Web 2.0 tools at work. Originality/value: The research presented in this special issue provides an important academic contribution towards an area that is, at present, under researched namely, whether there is a structured approach that can be universally applied by organisations when internally implementing Web 2.0 technologies into their work place.||0||0|
|Implementing Web 2.0 tools in organisations: Feasibility of a systematic approach||Baxter G.J.
Social networking sites
Web 2.0 best practice guidelines
Web 2.0 implementation model
|Learning Organization||English||Purpose: The aim of this paper is to examine the subject area of implementing Web 2.0 tools in organisations to identify from the literature common issues that must be addressed to assist organisations in their approach towards introducing Web 2.0 tools in their workplace. Based on the findings of the literature a Web 2.0 tools implementation model is presented. Design/methodology/approach: A general scoping review of the literature will be conducted to identify potential issues that might impact on the implementation of Web 2.0 tools in organisations to provide an overview of examples of empirical evidence that exists in this subject area with a view to examining how to advance this particular field of research. Findings: The findings of the scoping literature review indicate that while certain conceptual models and frameworks exist on how to implement Web 2.0 tools in organisations there is a lack of evidence to suggest that they have been empirically tested. The paper also notes that though organisations are unique, based on the literature common features can be found regarding "best practice" on how to introduce Web 2.0 tools in organisations. Research limitations/implications: This paper does not present any findings based on an empirical study involving the implementation of Web 2.0 tools in organisations. The paper does however provide scope for both academic and management practitioners to adopt and test the models and frameworks identified in the literature review when implementing Web 2.0 tools in their organisations. Originality/value: The contribution to knowledge that this paper provides is that it reviews an area where there is a lack of empirical evidence, namely, in the approaches that organisations can adopt when implementing Web 2.0 tools. Based on the findings from the literature and through the creation of a Web 2.0 tools implementation model, this paper provides practical guidance to management practitioners who might find introducing Web 2.0 tools into the workplace a challenge.||0||0|
|Improving contextual advertising matching by using Wikipedia thesaurus knowledge||GuanDong Xu
|Knowledge and Information Systems||English||As a prevalent type of Web advertising, contextual advertising refers to the placement of the most relevant commercial ads within the content of a Web page, to provide a better user experience and as a result increase the user's ad-click rate. However, due to the intrinsic problems of homonymy and polysemy, the low intersection of keywords, and a lack of sufficient semantics, traditional keyword matching techniques are not able to effectively handle contextual matching and retrieve relevant ads for the user, resulting in an unsatisfactory performance in ad selection. In this paper, we introduce a new contextual advertising approach to overcome these problems, which uses Wikipedia thesaurus knowledge to enrich the semantic expression of a target page (or an ad). First, we map each page into a keyword vector, upon which two additional feature vectors, the Wikipedia concept and category vector derived from the Wikipedia thesaurus structure, are then constructed. Second, to determine the relevant ads for a given page, we propose a linear similarity fusion mechanism, which combines the above three feature vectors in a unified manner. Last, we validate our approach using a set of real ads, real pages along with the external Wikipedia thesaurus. The experimental results show that our approach outperforms the conventional contextual advertising matching approaches and can substantially improve the performance of ad selection. © 2014 Springer-Verlag London.||0||0|
|Improving modern art articles on Wikipedia, a partnership between Wikimédia France and Centre Georges Pompidou||Sylvain Machefert||Museum
|Préconférence IFLA 2014 - Bibliothèques d'art||French||The Centre Georges Pompidou is a structure in Paris hosting the "Musée National d'Art Moderne", largest museum for modern art in Europe. Wikimédia France is a French organization working on promoting Wikipedia and other Wikimedia projects, by organizing trainings or conducting partnerships for example. The project described in this proposal has been led by the GLAM (Galleries Libraries Archives and Museums) working group of Wikimédia France and Pompidou museum curators.||3||0|
|Improving tag-based recommendation with the collaborative value of wiki pages for knowledge sharing||Durao F.
Social and community intelligence
|Journal of Ambient Intelligence and Humanized Computing||English||This exploratory study investigates how organisations can support knowledge transferring by exploiting social and community intelligence. In particular, this work analysis the potential of wiki technology as a tool for knowledge sharing in corporate wikis. Wikis are hypertext systems that support team-oriented collaborative work. Corporate wikis are especially turned to enhance internal knowledge sharing in enterprises. This research study sought to empirically determine the value of wiki pages that emerged from such collaboration in corporate wikis. As a research challenge, we evaluate how tag-based recommendations benefit from this value in a problem solving context. The recommendations are evaluated on their capability of transferring knowledge and help users to solve tasks. In this sense, we create a problem solving scenario where users need to use the recommendations to get their tasks solved. Meanwhile, we attempt to support users individually to find their own solutions, our recommendations are intended to enhance the overall organisation's problem solving capacity. Results from an experiment with 63 participants show that more successful recommendations can be obtained if the collaborative value of pages is considered. In essence, this work demonstrates how the value of wiki pages can produce significant quality support in assisting individuals to get their problems solved and sharing knowledge in collaborative spaces. In addition to this evaluation, professionals from software companies were interviewed about the usefulness and adoption of the recommendation model in their corporate wikis.||0||0|
|Inferring attitude in online social networks based on quadratic correlation||Chao Wang
|Lecture Notes in Computer Science||English||The structure of an online social network in most cases cannot be described just by links between its members. We study online social networks, in which members may have certain attitude, positive or negative, toward each other, and so the network consists of a mixture of both positive and negative relationships. Our goal is to predict the sign of a given relationship based on the evidences provided in the current snapshot of the network. More precisely, using machine learning techniques we develop a model that after being trained on a particular network predicts the sign of an unknown or hidden link. The model uses relationships and influences from peers as evidences for the guess, however, the set of peers used is not predefined but rather learned during the training process. We use quadratic correlation between peer members to train the predictor. The model is tested on popular online datasets such as Epinions, Slashdot, and Wikipedia. In many cases it shows almost perfect prediction accuracy. Moreover, our model can also be efficiently updated as the underlying social network evolves.||0||0|
|Information overload and virtual institutions||Memmi D.||Information overload
|AI and Society||English||The Internet puts at our disposal an unprecedented wealth of information. Unfortunately much of this information is unreliable and its very quantity exceeds our cognitive capacity. To deal with the resulting information overload requires knowledge evaluation procedures that have traditionally been performed by social institutions, such as the press or universities. But the Internet has also given rise to a new type of social institution operating online, such as Wikipedia. We will analyze these virtual institutions to understand how they function, and to determine to what extent they can help manage the information overload. Their distributed and collaborative nature, their agility and low cost make them not only a very interesting social model, but also a rather fragile one. To be durable, virtual institutions probably need strong rules and norms, as well as an appropriate social framework.||0||0|
|Intelligent searching using delay semantic network||Dvorscak S.
|SAMI 2014 - IEEE 12th International Symposium on Applied Machine Intelligence and Informatics, Proceedings||English||Article introduces different way how to implement semantic search, using semantic search agent over information obtained directly from web. The paper describes time delay form of semantic network, which we have used for providing of semantic search. Using of time-delay aspect inside semantic network has positive impact in several ways. It provides way how to represent knowledges dependent on time via semantic network, but also how to optimize a process of inference. That is all realized for Wikipedia articles in the form of search engine. The core's implementation is realized in way of massive multithread inference mechanism for massive semantic network.||0||0|
|Interdisciplinary project-based learning: an online wiki experience in teacher education||Biasutti M.
|Technology, Pedagogy and Education||English||In the current research study the use of Wikis as an online didactic tool to apply project-based learning in higher education was reported. The study was conducted in university teacher education programmes. During the online activities, participants developed interdisciplinary projects for the primary school working collaboratively in small groups in a Wiki virtual environment within the Moodle platform. Science was at the core of the projects and acted as an organising hub to finding links with other disciplines. A mixed-methods approach involving the collection of both quantitative and qualitative data was adopted in the current research study. The authors developed the following three instruments in order to measure both processes and outcomes of the online activities: the interdisciplinary project-based learning questionnaire, the reflection questionnaire and a rubric for assessing interdisciplinary projects. The current paper focuses only on the qualitative data, which were subjected to an inductive content analysis. Results provided evidence of the processes involved during the collaborative activities and that online activities can develop teachers' abilities to design projects in interdisciplinary contexts. The discussion highlights the aspects of the online environment that made the collaborative work effective in learning. Future implications and suggestions for teacher education programmes are discussed. © 2014 © 2014 Association for Information Technology in Teacher Education.||0||0|
|Investigation of information behavior in Wikipedia articles||Rosch B.||Eye-tracking
|Proceedings of the 5th Information Interaction in Context Symposium, IIiX 2014||English||This work aims to explore information behavior in selected Wikipedia articles. To get insights into users' interaction with pictorial and textual contents eye-tracking experiments are conducted. Spread of information within the articles and the relation between text and images are analyzed.||0||0|
|Iranian EFL learners' vocabulary development through wikipedia||Khany R.
|English Language Teaching||English||Language teaching has passed through a long way in search of a remedy for language learners and teachers. Countless theories, approaches, and methods have been recommended. With all these, however, more inclusive L2 theories and models ought to be considered to come up with real classroom practices. One of such crucial practices is authenticity, being straightforwardly found in web-based materials in general and Wikipedia texts and tasks in particular. In the same line and based on sound theoretical underpinnings, the place of Wikipedia is investigated in this study as a prospective tool to teach and learn a major language component with practical procedures i.e. vocabulary knowledge. To this end, 36 intermediate Iranian EFL students assigned to two control and experimental groups took part in the study. The results of the tests administered divulged that the learners in the Wikipedia group surpassed those of the control group. Hence, Wikipedia is considered as an encouraging authentic resource to assist EFL learners in improving their vocabulary knowledge. Implications of present findings and suggestions for further research are discussed.||0||0|
|Keyword extraction using multiple novel features||Yang S.
Natural Language Processing
|Journal of Computational Information Systems||English||In this paper, we propose a novel approach for keyword extraction. Different from previous keyword extraction methods, which identify keywords based on the document alone, this approach introduces Wikipedia knowledge and document genre to extract keywords from the document. Keyword extraction is accomplished by a classification model utilizing not only traditional word based features but also features based on Wikipedia knowledge and document genre. In our experiment, this novel keyword extraction approach outperforms previous models for keyword extraction in terms of precision-recall metric and breaks through the plateau previously reached in the field. © 2014 Binary Information Press.||0||0|
|Knowledge construction for wiki education applied in moodle 2.3||Suhaimi S.M.
Wiki data log
|Kondenzer: Exploration and visualization of archived social media||Alonso O.
|Proceedings - International Conference on Data Engineering||English||Modern social networks such as Twitter provide a platform for people to express their opinions on a variety of topics ranging from personal to global. While the factual part of this information and the opinions of various experts are archived by sources such as Wikipedia and reputable news articles, the opinion of the general public is drowned out in a sea of noise and 'un-interesting' information. In this demo we present Kondenzer - an offline system for condensing, archiving and visualizing social data. Specifically, we create digests of social data using a combination of filtering, duplicate removal and efficient clustering. This gives a condensed set of high quality data which is used to generate facets and create a collection that can be visualized using the PivotViewer control.||0||0|
|La connaissance est un réseau: Perspective sur l’organisation archivistique et encyclopédique||Martin Grandjean||Les Cahiers du Numérique||French||Network analysis is not revolutionizing our objects of study, it revolutionizes the perspective of the researcher on the latter. Organized as a network, information becomes relational. It makes potentially possible the creation of new information, as with an encyclopedia which links between records weave a web which can be analyzed in terms of structural characteristics or with an archive directory which sees its hierarchy fundamentally altered by an index recomposing the information exchange network within a group of people. On the basis of two examples of management, conservation and knowledge enhancement tools, the online encyclopedia Wikipedia and the archives of the Intellectual Cooperation of the League of Nations, this paper discusses the relationship between the researcher and its object understood as a whole.
Abstract (french)L’analyse de réseau ne transforme pas nos objets d’étude, elle transforme le regard que le chercheur porte sur ceux-ci. Organisée en réseau, l’information devient relationnelle. Elle rend possible en puissance la création d’une nouvelle connaissance, à l’image d’une encyclopédie dont les liens entre les notices tissent une toile dont on peut analyser les caractéristiques structurelles ou d’un répertoire d’archives qui voit sa hiérarchie bouleversée par un index qui recompose le réseau d’échange d’information à l’intérieur d’un groupe de personnes. Sur la base de deux exemples d’outils de gestion, conservation et valorisation de la connaissance, l’encyclopédie en ligne Wikipédia et les archives de la coopération intellectuelle de la Société des Nations, cet article questionne le rapport entre le chercheur et son objet compris dans sa globalité. [Version preprint disponible].
|La négociation contre la démocratie : le cas Wikipedia||Pierre-Carl Langlais||Computer mediated community
Communautés en ligne
|Négociations||French||The first pillar of Wikipedia stresses that « Wikipedia is not a democracy ». The wikipedian communities tend to view democracy and polling as the alter ego (if not the nemesis) of negociation and consensual thought. This article questions the validity and the motives of such a specific conception. Using the conceptual framework of Arend Lijphart, it describes the emergence of a joint-system, which includes elements of majoritarian democracy into the general setting of a consensual democracy. The unconditional rejection of democratic interpretation seems to have its own social use : it allows a pragmatic acclimation of pre-existent procedures in the static political system.||0||0|
|La participation contributive des publics et des personnels au musée : l’exemple du partenariat entre Wikimédia France et le Centre Pompidou||Céline Rabaud||French||0||1|
|Large-scale author verification: Temporal and topical influences||Van Dam M.
|English||The task of author verification is concerned with the question whether or not someone is the author of a given piece of text. Algorithms that extract writing style features from texts are used to determine how close in style different documents are. Currently, evaluations of author verification algorithms are restricted to small-scale corpora with usually less than one hundred test cases. In this work, we present a methodology to derive a large-scale author verification corpus based on Wikipedia Talkpages. We create a corpus based on English Wikipedia which is significantly larger than existing corpora. We investigate two dimensions on this corpus which so far have not received sufficient attention: the influence of topic and the influence of time on author verification accuracy. Copyright 2014 ACM.||0||0|
|Learning a lexical simplifier using Wikipedia||Horn C.
|English||In this paper we introduce a new lexical simplification approach. We extract over 30K candidate lexical simplifications by identifying aligned words in a sentence-aligned corpus of English Wikipedia with Simple English Wikipedia. To apply these rules, we learn a feature-based ranker using SVM rank trained on a set of labeled simplifications collected using Amazon's Mechanical Turk. Using human simplifications for evaluation, we achieve a precision of 76% with changes in 86% of the examples.||0||0|
|Learning from a wiki way of learning||Page K.L.
Wiki way of learning
|Studies in Higher Education||English||There is a growing need to design learning experiences in higher education that develop collaborative and mediated social writing practices. A wiki way of learning addresses these needs. This paper reports findings from a case study involving 58 postgraduate students who in small groups participated over eight weeks in a mediated collaborative writing project with and through wiki contexts. The project was not assessed but designed for task-based domain learning. Evaluation of the project was conducted using data drawn from multiple sources collected before, during and after the project. Findings show that participation in the project had a positive relationship with student exam performance, and web familiarity. Patterns of individual and group wiki project participation, and sex differences in participation, are discussed. © 2014 © 2014 Society for Research into Higher Education.||0||0|
|Learning to compute semantic relatedness using knowledge from wikipedia||Zheng C.
|Lecture Notes in Computer Science||English||Recently, Wikipedia has become a very important resource for computing semantic relatedness (SR) between entities. Several approaches have already been proposed to compute SR based on Wikipedia. Most of the existing approaches use certain kinds of information in Wikipedia (e.g. links, categories, and texts) and compute the SR by empirically designed measures. We have observed that these approaches produce very different results for the same entity pair in some cases. Therefore, how to select appropriate features and measures to best approximate the human judgment on SR becomes a challenging problem. In this paper, we propose a supervised learning approach for computing SR between entities based on Wikipedia. Given two entities, our approach first maps entities to articles in Wikipedia; then different kinds of features of the mapped articles are extracted from Wikipedia, which are then combined with different relatedness measures to produce nine raw SR values of the entity pair. A supervised learning algorithm is proposed to learn the optimal weights of different raw SR values. The final SR is computed as the weighted average of raw SRs. Experiments on benchmark datasets show that our approach outperforms baseline methods.||0||0|
|Learning to expand queries using entities||Brandao W.C.
De Moura E.S.
Da Silva A.S.
|Journal of the Association for Information Science and Technology||English||A substantial fraction of web search queries contain references to entities, such as persons, organizations, and locations. Recently, methods that exploit named entities have been shown to be more effective for query expansion than traditional pseudorelevance feedback methods. In this article, we introduce a supervised learning approach that exploits named entities for query expansion using Wikipedia as a repository of highquality feedback documents. In contrast with existing entity-oriented pseudorelevance feedback approaches, we tackle query expansion as a learning-to-rank problem. As a result, not only do we select effective expansion terms but we also weigh these terms accordingto their predicted effectiveness. To this end, we exploit the rich structure of Wikipedia articles to devise discriminative term features, including each candidate term's proximity to the original query terms, as well as its frequency across multiple article fields and in category and infobox descriptors. Experiments on three Text REtrieval Conference web test collections attest the effectiveness of our approach, with gains of up to 23.32% in terms of mean average precision, 19.49% in terms of precision at 10, and 7.86% in terms of normalized discounted cumulative gain compared with a state-of-the-art approach for entity-oriented query expansion.||0||0|
|Les jeunes, leurs enseignants et Wikipédia : représentations en tension autour d’un objet documentaire singulier||Sahut Gilles||Wikipedia
|(Documentaliste-Sciences de l'information. 2014 June;51(2):p. 70-79) DOI : 10.3917/docsi.512.0070||The collaborative encyclopedia Wikipedia is a heavily used resource, especially by high school and college students, whether for school work or personal reasons. However, for most teachers and information professionals, the jury is still out on the validity of its contents. Are young persons aware of its controversial reputation ? What opinions, negative or positive, do they hold ? How much confidence do they place in this information resource ? This survey of high school and college students provides an opportunity to grasp the diversity of attitudes towards Wikipedia and also how these evolve as the students move up the grade ladder. More widely, this article studies the factors that condition the degree of acceptability of the contents of this unusual source of information.||0||0|
|Leveraging open source tools for Web mining||Pennete K.C.||Data mining
|Lecture Notes in Electrical Engineering||English||Web mining is the most pursued research area and often the most challenging one. Using web mining, corporates and individuals alike are inquisitively pursuing to unravel the hidden knowledge underneath the diverse gargantuan volumes of web data. This paper tries to present how a researcher can leverage the colossal knowledge available in open access sites such as Wikipedia as a source of information rather than subscribing to closed networks of knowledge and use open source tools rather than prohibitively priced commercial mining tools to do web mining. The paper illustrates a step-by-step usage of R and RapidMiner in web mining to enable a novice to understand the concepts as well as apply it in real world.||0||0|
|Lexical speaker identification in TV shows||Roy A.
Lexical speaker identification
|Multimedia Tools and Applications||English||It is possible to use lexical information extracted from speech transcripts for speaker identification (SID), either on its own or to improve the performance of standard cepstral-based SID systems upon fusion. This was established before typically using isolated speech from single speakers (NIST SRE corpora, parliamentary speeches). On the contrary, this work applies lexical approaches for SID on a different type of data. It uses the REPERE corpus consisting of unsegmented multiparty conversations, mostly debates, discussions and Q&A sessions from TV shows. It is hypothesized that people give out clues to their identity when speaking in such settings which this work aims to exploit. The impact on SID performance of the diarization front-end required to pre-process the unsegmented data is also measured. Four lexical SID approaches are studied in this work, including TFIDF, BM25 and LDA-based topic modeling. Results are analysed in terms of TV shows and speaker roles. Lexical approaches achieve low error rates for certain speaker roles such as anchors and journalists, sometimes lower than a standard cepstral-based Gaussian Supervector - Support Vector Machine (GSV-SVM) system. Also, in certain cases, the lexical system shows modest improvement over the cepstral-based system performance using score-level sum fusion. To highlight the potential of using lexical information not just to improve upon cepstral-based SID systems but as an independent approach in its own right, initial studies on crossmedia SID is briefly reported. Instead of using speech data as all cepstral systems require, this approach uses Wikipedia texts to train lexical speaker models which are then tested on speech transcripts to identify speakers. © 2014 Springer Science+Business Media New York.||0||0|
|Lightweight domain ontology learning from texts:Graph theory-based approach using wikipedia||Ahmed K.B.
Lightweight domain ontologies
Ontology learning from texts
|International Journal of Metadata, Semantics and Ontologies||English||Ontology engineering is the backbone of the semantic web. However, the construction of formal ontologies is a tough exercise which requires time and heavy costs. Ontology learning is thus a solution for this requirement. Since texts are massively available everywhere, making up of experts' knowledge and their know-how, it is of great value to capture the knowledge existing within such texts. Our approach is thus the kind of research work that answers the challenge of creating concepts' hierarchies from textual data taking advantage of the Wikipedia encyclopaedia to achieve some good-quality results. This paper presents a novel approach which essentially uses plain text Wikipedia instead of its categorical system and works with a simplified algorithm to infer a domain taxonomy from a graph.© 2014 Inderscience Enterprises Ltd.||0||0|
|Little creatures that run the world: Bringing ants to a wider audience||Barr D.||Ants
|Science and Technology Libraries||English||Ants (Hymenoptera: Formicidae) are among the most ubiquitous and successful creatures on earth. They are the subject of research by ant biologists worldwide, and with over 8,800 identified species, access to quality information is important for those researchers. AntWiki (http://www.antwiki.org/) was created originally as an online listing of all ant taxonomists and their papers, and librarians have assisted the project by finding, scanning, and uploading articles and contributing to a page on Human Culture and Ants. © Dorothy Barr.||0||0|
|MIGSOM: A SOM algorithm for large scale hyperlinked documents inspired by neuronal migration||Kotaro Nakayama
|Lecture Notes in Computer Science||English||The SOM (Self Organizing Map), one of the most popular unsupervised machine learning algorithms, maps high-dimensional vectors into low-dimensional data (usually a 2-dimensional map). The SOM is widely known as a "scalable" algorithm because of its capability to handle large numbers of records. However, it is effective only when the vectors are small and dense. Although a number of studies on making the SOM scalable have been conducted, technical issues on scalability and performance for sparse high-dimensional data such as hyperlinked documents still remain. In this paper, we introduce MIGSOM, an SOM algorithm inspired by new discovery on neuronal migration. The two major advantages of MIGSOM are its scalability for sparse high-dimensional data and its clustering visualization functionality. In this paper, we describe the algorithm and implementation in detail, and show the practicality of the algorithm in several experiments. We applied MIGSOM to not only experimental data sets but also a large scale real data set: Wikipedia's hyperlink data.||0||0|
|Mapping of scientific patenting: toward the development of 'J-GLOBAL foresight'||Jibu M.||Bibliometric analysis
|Technology Analysis and Strategic Management||English||The Japan Science and Technology Agency (JST) is in the process of building knowledge infrastructure by means of linking accumulated information assets to a variety of databases. It does not aim to develop knowledge data infrastructure based on proprietary format, but on an international standard format. JST is also in the process of creating 'J-GLOBAL foresight' (accessed June 2012) in order to match up a variety of data such as results and indices of bibliometric analysis as well as of patent analysis derived from the knowledge infrastructure with applications like Google Maps and facilitate the visualisation of business information. This will contribute to help companies and institutions formulate business strategy based on the information obtained in the future. The former aims to be the bibliographic information version of the Data.gov, which discloses government data from the USA, while the latter seeks to be the Data-gov wiki version, which provides a demonstration by matching up governmental data with applications such as Google Maps.||0||0|
|Massive query expansion by exploiting graph knowledge bases for image retrieval||Guisado-Gamez J.
Graph mining techniques
|ICMR 2014 - Proceedings of the ACM International Conference on Multimedia Retrieval 2014||English||Annotation-based techniques for image retrieval suffer from sparse and short image textual descriptions. Moreover, users are often not able to describe their needs with the most appropriate keywords. This situation is a breeding ground for a vocabulary mismatch problem resulting in poor results in terms of retrieval precision. In this paper, we propose a query expansion technique for queries expressed as keywords and short natural language descriptions. We present a new massive query expansion strategy that enriches queries using a graph knowledge base by identifying the query concepts, and adding relevant synonyms and semantically related terms. We propose a topological graph enrichment technique that analyzes the network of relations among the concepts, and suggests semantically related terms by path and community detection analysis of the knowledge graph. We perform our expansions by using two versions of Wikipedia as knowledge base achieving improvements of the system's precision up to more than 27% Copyright 2014 ACM.||0||0|
|Maturity assessment of Wikipedia medical articles||Conti R.
|Automatic quality evaluation
Multi-criteria decision making
Wikipedia Medicine Portal
|Proceedings - IEEE Symposium on Computer-Based Medical Systems||English||Recent studies report that Internet users are growingly looking for health information through the Wikipedia Medicine Portal, a collaboratively edited multitude of articles with contents often comparable with professionally edited material. Automatic quality assessment of the Wikipedia medical articles has not received much attention by Academia and it presents open distinctive challenges. In this paper, we propose to tag the medical articles on the Wikipedia Medicine Portal, clearly stating their maturity degree, intended as a summarizing measure of several article properties. For this purpose, we adopt the Analytic Hierarchy Process, a well known methodology for decision making, and we evaluate the maturity degree of more than 24000 Wikipedia medical articles. The obtained results show how the qualitative analysis of medical content not always overlap with a quantitative analysis (an example of which is shown in the paper), since important properties of an article can hardly be synthesized by quantitative features. This seems particularly true when the analysis considers the concept of maturity, defined and verified in this work.||0||0|
|Mining hidden concepts: Using short text clustering and wikipedia knowledge||Yang C.-L.
|Proceedings - 2014 IEEE 28th International Conference on Advanced Information Networking and Applications Workshops, IEEE WAINA 2014||English||In recent years, there has been a rapidly increasing use of social networking platforms in the forms of short-text communication. However, due to the short-length of the texts used, the precise meaning and context of these texts are often ambiguous. To address this problem, we have devised a new community mining approach that is an adaptation and extension of text clustering, using Wikipedia as background knowledge. Based on this method, we are able to achieve a high level of precision in identifying the context of communication. Using the same methods, we are also able to efficiently identify hidden concepts in Twitter texts. Using Wikipedia as background knowledge considerably improved the performance of short text clustering.||0||0|
|Mining knowledge on relationships between objects from the web||Xiaodan Zhang
|Content-based image retrieval
Relationships between objects
|IEICE Transactions on Information and Systems||English||How do global warming and agriculture influence each other? It is possible to answer the question by searching knowledge about the relationship between global warming and agriculture. As exemplified by this question, strong demands exist for searching relationships between objects. Mining knowledge about relationships on Wikipedia has been studied. However, it is desired to search more diverse knowledge about relationships on theWeb. By utilizing the objects constituting relationships mined from Wikipedia, we propose a new method to search images with surrounding text that include knowledge about relationships on the Web. Experimental results show that our method is effective and applicable in searching knowledge about relationships. We also construct a relationship search system named "Enishi" based on the proposed new method. Enishi supplies a wealth of diverse knowledge including images with surrounding text to help users to understand relationships deeply, by complementarily utilizing knowledge from Wikipedia and the Web. Copyright||0||0|
|Mining the personal interests of microbloggers via exploiting wikipedia knowledge||Fan M.
|Lecture Notes in Computer Science||English||This paper focuses on an emerging research topic about mining microbloggers' personalized interest tags from their own microblogs ever posted. It based on an intuition that microblogs indicate the daily interests and concerns of microblogs. Previous studies regarded the microblogs posted by one microblogger as a whole document and adopted traditional keyword extraction approaches to select high weighting nouns without considering the characteristics of microblogs. Given the less textual information of microblogs and the implicit interest expression of microbloggers, we suggest a new research framework on mining microbloggers' interests via exploiting the Wikipedia, a huge online word knowledge encyclopedia, to take up those challenges. Based on the semantic graph constructed via the Wikipedia, the proposed semantic spreading model (SSM) can discover and leverage the semantically related interest tags which do not occur in one's microblogs. According to SSM, An interest mining system have implemented and deployed on the biggest microblogging platform (Sina Weibo) in China. We have also specified a suite of new evaluation metrics to make up the shortage of evaluation functions in this research topic. Experiments conducted on a real-time dataset demonstrate that our approach outperforms the state-of-the-art methods to identify microbloggers' interests.||0||0|
|Monitoring teachers' complex thinking while engaging in philosophical inquiry with web 2.0||Agni Stylianou-Georgiou
Philosophy for children
|Lecture Notes in Computer Science||English||The purpose of this study was to examine how we can exploit new technologies to scaffold and monitor the development of teachers' complex thinking while engaging in philosophical inquiry. We set up an online learning environment using wiki and forum technologies and we organized the activity in four major steps to scaffold complex thinking for the teacher participants. In this article, we present the evolution of complex thinking of one group of teachers by studying their interactions in depth.||0||0|
|Motivating Wiki-based collaborative learning by increasing awareness of task conflict: A design science approach||Wu K.
|Lecture Notes in Computer Science||English||Wiki system has been deployed in many collaborative learning projects. However, lack of motivation is a serious problem in the collaboration process. The wiki system is originally designed to hide authorship information. Such design may hinder users from being aware of task conflict, resulting in undesired outcomes (e.g. reduced motivation, suppressed knowledge exchange activities). We propose to incorporate two different tools in wiki systems to motivate learners by increasing awareness of task conflict. A field test was executed in two collaborative writing projects. The results from a wide-scale survey and a focus group study confirmed the utility of the new tools and suggested that these tools can help learners develop both extrinsic and intrinsic motivations to contribute. This study has several theoretical and practical implications, it enriched the knowledge of task conflict, proposed a new way to motivate collaborative learning, and provided a low-cost resolution to manage task conflict.||0||0|
|Motivations for Contributing to Health-Related Articles on Wikipedia: An Interview Study||Farič N
Consumer health information
|Journal of Medical Internet Research||English||Background: Wikipedia is one of the most accessed sources of health information online. The current English-language Wikipedia contains more than 28,000 articles pertaining to health.
Objective: The aim was to characterize individuals’ motivations for contributing to health content on the English-language Wikipedia.
Methods: A set of health-related articles were randomly selected and recent contributors invited to complete an online questionnaire and follow-up interview (by Skype, by email, or face-to-face). Interviews were transcribed and analyzed using thematic analysis and a realist grounded theory approach.
Results: A total of 32 Wikipedians (31 men) completed the questionnaire and 17 were interviewed. Those completing the questionnaire had a mean age of 39 (range 12-59) years; 16 had a postgraduate qualification, 10 had or were currently studying for an undergraduate qualification, 3 had no more than secondary education, and 3 were still in secondary education. In all, 15 were currently working in a health-related field (primarily clinicians). The median period for which they have been an active editing Wikipedia was 3-5 years. Of this group, 12 were in the United States, 6 were in the United Kingdom, 4 were in Canada, and the remainder from another 8 countries. Two-thirds spoke more than 1 language and 90% (29/32) were also active contributors in domains other than health. Wikipedians in this study were identified as health professionals, professionals with specific health interests, students, and individuals with health problems. Based on the interviews, their motivations for editing health-related content were summarized in 5 strongly interrelated categories: education (learning about subjects by editing articles), help (wanting to improve and maintain Wikipedia), responsibility (responsibility, often a professional responsibility, to provide good quality health information to readers), fulfillment (editing Wikipedia as a fun, relaxing, engaging, and rewarding activity), and positive attitude to Wikipedia (belief in the value of Wikipedia). An additional factor, hostility (from other contributors), was identified that negatively affected Wikipedians’ motivations.Conclusions: Contributions to Wikipedia’s health-related content in this study were made by both health specialists and laypeople of varying editorial skills. Their motivations for contributing stem from an inherent drive based on values, standards, and beliefs. It became apparent that the community who most actively monitor and edit health-related articles is very small. Although some contributors correspond to a model of “knowledge philanthropists,” others were focused on maintaining articles (improving spelling and grammar, organization, and handling vandalism). There is a need for more people to be involved in Wikipedia’s health-related content.
|Multilinguals and wikipedia editing||Hale S.A.||Cross-language
Social network analysis
|WebSci 2014 - Proceedings of the 2014 ACM Web Science Conference||English||This article analyzes one month of edits to Wikipedia in order to examine the role of users editing multiple language editions (referred to as multilingual users). Such multilingual users may serve an important function in diffusing information across different language editions of the encyclopedia, and prior work has suggested this could reduce the level of self-focus bias in each edition. This study finds multilingual users are much more active than their single-edition (monolingual) counterparts. They are found in all language editions, but smaller-sized editions with fewer users have a higher percentage of multilingual users than larger-sized editions. About a quarter of multilingual users always edit the same articles in multiple languages, while just over 40% of multilingual users edit different articles in different languages. When non-English users do edit a second language edition, that edition is most frequently English. Nonetheless, several regional and linguistic cross-editing patterns are also present. Copyright||0||0|
|Mutual disambiguation for entity linking||Charton E.
|English||The disambiguation algorithm presented in this paper is implemented in SemLinker, an entity linking system. First, named entities are linked to candidate Wikipedia pages by a generic annotation engine. Then, the algorithm re-ranks candidate links according to mutual relations between all the named entities found in the document. The evaluation is based on experiments conducted on the test corpus of the TAC-KBP 2012 entity linking task.||0||0|
|Myths to burst about hybrid learning||Li K.C.||Hybrid learning
|Lecture Notes in Computer Science||English||Given the snowballing attention to and growing popularity of hybrid learning, some take for granted that the learning mode means more effective education delivery while some who hold a skeptical view expect researchers to inform them whether hybrid learning leads to better learning effectiveness. Though diversified, both beliefs are like myths about the hybrid mode. By reporting findings concerning the use of wikis in a major project on hybrid courses piloted at a university in Hong Kong, this paper highlights the complexity concerning the effectiveness of a hybrid learning mode and the problems of a reductionistic view of its effectiveness. Means for elearning were blended with conventional distance learning components into four undergraduate courses. Findings show that a broad variety of factors, including subject matters, instructors' pedagogical knowledge of the teaching means, students' readiness for the new learning mode and the implementation methods, play a key role in deciding learning effectiveness, rather than just the delivery mode per se.||0||0|
|Named entity evolution analysis on wikipedia||Holzmann H.
|Named entity evolution
|WebSci 2014 - Proceedings of the 2014 ACM Web Science Conference||English||Accessing Web archives raises a number of issues caused by their temporal characteristics. Additional knowledge is needed to find and understand older texts. Especially entities mentioned in texts are subject to change. Most severe in terms of information retrieval are name changes. In order to find entities that have changed their name over time, search engines need to be aware of this evolution. We tackle this problem by analyzing Wikipedia in terms of entity evolutions mentioned in articles. We present statistical data on excerpts covering name changes, which will be used to discover similar text passages and extract evolution knowledge in future work. Copyright||0||0|
|No praise without effort: Experimental evidence on how rewards affect Wikipedia's contributor community||Restivo M.
Van de Rijt A.
|Information Communication and Society||English||The successful provision of public goods through mass volunteering over the Internet poses a puzzle to classic social science theories of human cooperation. A solution suggested by recent studies proposes that informal rewards (e.g. a thumbs-up, a badge, an editing award, etc.) can motivate participants by raising their status in the community, which acts as a select incentive to continue contributing. Indeed, a recent study of Wikipedia found that receiving a reward had a large positive effect on the subsequent contribution levels of highly-active contributors. While these findings are suggestive, they only pertained to already highly-active contributors. Can informal rewards also serve as a mechanism to increase participation among less-active contributors by initiating a virtuous cycle of work and reward? We conduct a field experiment on the online encyclopedia Wikipedia in which we bestowed rewards to randomly selected editors of varying productivity levels. Analysis of post-treatment activity shows that despite greater room for less-active contributors to increase their productive efforts, rewards yielded increases in work only among already highly-productive editors. On the other hand, rewards were associated with lower retention of less-active contributors. These findings suggest that the incentive structure in peer production is broadly meritocratic, as highly-active contributors accumulate the most rewards. However, this may also contribute to the divide between the stable core of highly-prodigious producers and a peripheral population of less-active contributors with shorter volunteer tenures.||0||0|
|Okinawa in Japanese and English Wikipedia||Hale S.A.||Cross-language
|Conference on Human Factors in Computing Systems - Proceedings||English||This research analyzes edits by foreign-language users in Wikipedia articles about Okinawa, Japan, in the Japanese and English editions of the encyclopedia. Okinawa, home to both English and Japanese speaking users, provides a good case to look at content differences and cross-language editing in a small geographic area on Wikipedia. Consistent with prior work, this research finds large differences in the representations of Okinawa in the content of the two editions. The number of users crossing the language boundary to edit both editions is also extremely small. When users do edit in a non-primary language, they most frequently edit articles that have cross-language (interwiki) links, articles that are edited more by other users, and articles that have more images. Finally, the possible value of edits from foreign-language users and design possibilities to motivate wider contributions from foreign-language users are discussed.||0||0|
|On Measuring Malayalam Wikipedia||Vasudevan T V||Wiki pedia
|International Journal of Emerging Engineering Research and Technology||English||Wikipedia is a popular, multilingual, free internet encyclopedia. Anyone can edit articles in it. This paper presents an overview of research in the Malayalam edition of Wikipedia. History of Malayalam Wikipedia
is explained first. Different research lines related with Wikipedia are explored next. This is followed by an analysis of Malayalam Wikipedia’s fundamental components such as Articles, Authors and Edits along withGrowth and Quality. General trends are measured comparing with Wikipedias in other languages.
|On the influence propagation of web videos||Liu J.
|Unified virtual community space
Video influence estimation
Video origin estimation
|IEEE Transactions on Knowledge and Data Engineering||English||We propose a novel approach to analyze how a popular video is propagated in the cyberspace, to identify if it originated from a certain sharing-site, and to identify how it reached the current popularity in its propagation. In addition, we also estimate their influences across different websites outside the major hosting website. Web video is gaining significance due to its rich and eye-ball grabbing content. This phenomenon is evidently amplified and accelerated by the advance of Web 2.0. When a video receives some degree of popularity, it tends to appear on various websites including not only video-sharing websites but also news websites, social networks or even Wikipedia. Numerous video-sharing websites have hosted videos that reached a phenomenal level of visibility and popularity in the entire cyberspace. As a result, it is becoming more difficult to determine how the propagation took place-was the video a piece of original work that was intentionally uploaded to its major hosting site by the authors, or did the video originate from some small site then reached the sharing site after already getting a good level of popularity, or did it originate from other places in the cyberspace but the sharing site made it popular. Existing study regarding this flow of influence is lacking. Literature that discuss the problem of estimating a video's influence in the whole cyberspace also remains rare. In this article we introduce a novel framework to identify the propagation of popular videos from its major hosting site's perspective, and to estimate its influence. We define a Unified Virtual Community Space (UVCS) to model the propagation and influence of a video, and devise a novel learning method called Noise-reductive Local-and-Global Learning (NLGL) to effectively estimate a video's origin and influence. Without losing generality, we conduct experiments on annotated dataset collected from a major video sharing site to evaluate the effectiveness of the framework. Surrounding the collected videos and their ranks, some interesting discussions regarding the propagation and influence of videos as well as user behavior are also presented.||0||0|
|Ontology construction using multiple concept lattices||Wang W.C.
|Advanced Materials Research||English||The paper proposes an ontology construction approach that combines Fuzzy Formal Concept Analysis, Wikipedia and WordNet in a process that constructs multiple concept lattices for sub-domains. Those sub-domains are divided from the target domain. The multiple concept lattices approach can mine concepts and determine relations between concepts automatically, and construct domain ontology accordingly. This approach is suitable for the large domain or complex domain which contains obvious sub-domains.||0||0|
|Open collaboration for innovation: Principles and performance||Levine S.S.
|Organization Science||English||The principles of open collaboration for innovation (and production), once distinctive to open source software, are now found in many other ventures. Some of these ventures are Internet based: for example, Wikipedia and online communities. Others are off-line: they are found in medicine, science, and everyday life. Such ventures have been affecting traditional firms and may represent a new organizational form. Despite the impact of such ventures, their operating principles and performance are not well understood. Here we define open collaboration (OC), the underlying set of principles, and propose that it is a robust engine for innovation and production. First, we review multiple OC ventures and identify four defining principles. In all instances, participants create goods and services of economic value, they exchange and reuse each other's work, they labor purposefully with just loose coordination, and they permit anyone to contribute and consume. These principles distinguish OC from other organizational forms, such as firms or cooperatives. Next, we turn to performance. To understand the performance of OC, we develop a computational model, combining innovation theory with recent evidence on human cooperation. We identify and investigate three elements that affect performance: the cooperativeness of participants, the diversity of their needs, and the degree to which the goods are rival (subtractable). Through computational experiments, we find that OC performs well even in seemingly harsh environments: when cooperators are a minority, free riders are present, diversity is lacking, or goods are rival. We conclude that OC is viable and likely to expand into new domains. The findings also inform the discussion on new organizational forms, collaborative and communal.||0||0|
|Open domain question answering using Wikipedia-based knowledge model||Ryu P.-M.
|Information Processing and Management||English||This paper describes the use of Wikipedia as a rich knowledge source for a question answering (QA) system. We suggest multiple answer matching modules based on different types of semi-structured knowledge sources of Wikipedia, including article content, infoboxes, article structure, category structure, and definitions. These semi-structured knowledge sources each have their unique strengths in finding answers for specific question types, such as infoboxes for factoid questions, category structure for list questions, and definitions for descriptive questions. The answers extracted from multiple modules are merged using an answer merging strategy that reflects the specialized nature of the answer matching modules. Through an experiment, our system showed promising results, with a precision of 87.1%, a recall of 52.7%, and an F-measure of 65.6%, all of which are much higher than the results of a simple text analysis based system. © 2014 Elsevier Ltd. All rights reserved.||0||0|
|Opportunities for using Wiki technologies in building digital library models||Mammadov E.C.O.||Digital libraries
|Library Hi Tech News||English||Purpose: The purpose of this article is to research the open access and encyclopedia structured methodology of building digital libraries. In Azerbaijan Libraries, one of the most challenged topics is organizing digital resources (books, audio-video materials, etc.). Wiki technologies introduce easy, collaborative and open tools opportunities which make it possible to implement in digital library buildings. Design/methodology/approach: This paper looks at current practices, and the ways of organizing information resources to make them more systematized, open and accessible. These activities are valuable for rural libraries which are smaller and less well funded than main and central libraries in cities. Findings: The main finding of this article is how to organize digital resource management in the libraries using Wiki ideology. Originality/value: Wiki technologies determine the ways of building digital library network models which are structurally different from already known models, as well as new directions in forming information society and solving the problems encountered.||0||0|
|Pautas para la implementación de wikis para el desarrollo colaborativo de sistemas de buenas prácticas||Jesús Tramullas
Ana I. Sánchez-Casabón
|Estudios de Información, Documentación y archivos. Homenaje a la profesora Pilar Gay Molins||Spanish||This paper reviews the basic principles for developing collections of best practices through a wiki tool. Details several previous works, and proposes a pattern of generation and development of these types of resources.||0||0|
|Preferences in Wikipedia abstracts: Empirical findings and implications for automatic entity summarization||Xu D.
|Information Processing and Management||English||The volume of entity-centric structured data grows rapidly on the Web. The description of an entity, composed of property-value pairs (a.k.a. features), has become very large in many applications. To avoid information overload, efforts have been made to automatically select a limited number of features to be shown to the user based on certain criteria, which is called automatic entity summarization. However, to the best of our knowledge, there is a lack of extensive studies on how humans rank and select features in practice, which can provide empirical support and inspire future research. In this article, we present a large-scale statistical analysis of the descriptions of entities provided by DBpedia and the abstracts of their corresponding Wikipedia articles, to empirically study, along several different dimensions, which kinds of features are preferable when humans summarize. Implications for automatic entity summarization are drawn from the findings. © 2013 Elsevier Ltd. All rights reserved.||0||0|
|Promoting collaborative writing through wikis: a new approach for advancing innovative and active learning in an ESP context||Wang Y.-C.||Collaborative authoring
|Computer Assisted Language Learning||English||New approaches to language teaching have emerged as a result of increasing advances in technology. Over the past decade, web-based social networking platforms have been widely adopted as collaborative tools for facilitating foreign language learning. This study focused on a novel way of enabling ESP learners to profit from writing, which is collaboration through wikis. The aim was to improve Taiwanese students' English writing skills for business. The instruments used in this study included two writing tests and a survey questionnaire. Findings indicate that students who were engaged in the collaborative writing tasks gained mastery in business writing and enjoyed the challenge of this new learning experience. The results also suggest that wikis promote students' interest in language learning, boost the development of their writing competencies and enhance the collaboration skills needed for success in the workplace.||0||0|
|QuoDocs: Improving developer engagement in software documentation through gamification||Sukale R.
|Conference on Human Factors in Computing Systems - Proceedings||English||Open source projects are created and maintained by developers who are distributed across the globe. As projects become larger, a developer's knowledge of a project's conceptual model becomes specialized. When new members join a project, it is difficult for them to understand the reasoning behind the structure and organization of the project since they do not have access to earlier discussions. We interviewed and surveyed developers from a popular open source project hosting website to find out how they maintain documentation and communicate the project details with new members. We found that documentation is largely out of sync with code and that developers do not find maintaining it to be an engaging activity. In this paper, we propose a new system - QuoDocs - and take a human-centered approach to introduce competitiveness and personalization to engage software developers in documenting their projects.||0||0|
|REQcollect: Requirements collection, project matching and technology transition||Goldrich L.
|Proceedings of the Annual Hawaii International Conference on System Sciences||English||This paper describes the evolution of REQcollect (REQuirements Collection). REQcollect was developed through several iterations of agile development and the transition of other projects. Multiple federal agencies have sponsored the work as well as transitioned the technologies into use. The parents of REQcollect are REQdb (REQuirements Database) and DART3 (Department of Homeland Security Assistant for R&D Tracking and Technology Transfer) . DART3 was developed from three other projects: TPAM (Transition Planning and Assessment Model) , GNOSIS (Global Network Operations Survey and Information Sharing) [3,4] Aqueduct , a semantic MediaWiki extension. REQcollect combines the best components of these previous systems: a requirements elicitation and collection tool and a Google-like matching algorithm to identify potential transitions of R&D projects that match requirements.||0||0|
|Ranking Wikipedia article's data quality by learning dimension distributions||Jangwhan Han
Multivariate Gaussian distribution
|International Journal of Information Quality||English||As the largest free user-generated knowledge repository, data quality of Wikipedia has attracted great attention these years. Automatic assessment of Wikipedia article's data quality is a pressing concern. We observe that every Wikipedia quality class exhibits its specific characteristic along different first-class quality dimensions including accuracy, completeness, consistency and minimality. We propose to extract quality dimension values from article's content and editing history using dynamic Bayesian network (DBN) and information extraction techniques. Next, we employ multivariate Gaussian distributions to model quality dimension distributions for each quality class, and combine multiple trained classifiers to predict an article's quality class, which can distinguish different quality classes effectively and robustly. Experiments demonstrate that our approach generates a good performance. Copyright||0||0|
|Reader preferences and behavior on Wikipedia||Janette Lehmann
|HT 2014 - Proceedings of the 25th ACM Conference on Hypertext and Social Media||English||Wikipedia is a collaboratively-edited online encyclopaedia that relies on thousands of editors to both contribute articles and maintain their quality. Over the last years, research has extensively investigated this group of users while another group of Wikipedia users, the readers, their preferences and their behavior have not been much studied. This paper makes this group and its %their activities visible and valuable to Wikipedia's editor community. We carried out a study on two datasets covering a 13-months period to obtain insights on users preferences and reading behavior in Wikipedia. We show that the most read articles do not necessarily correspond to those frequently edited, suggesting some degree of non-alignment between user reading preferences and author editing preferences. We also identified that popular and often edited articles are read according to four main patterns, and that how an article is read may change over time. We illustrate how this information can provide valuable insights to Wikipedia's editor community.||0||0|
|Reading about explanations enhances perceptions of inevitability and foreseeability: A cross-cultural study with Wikipedia articles||Oeberst A.
Von Der Beck I.
|Cognitive Processing||English||In hindsight, people often perceive events to be more inevitable and foreseeable than in foresight. According to Causal Model Theory (Nestler et al. in J Exp Psychol Learn Mem Cogn 34: 1043-1054, 2008), causal explanations are crucial for such hindsight distortions to occur. The present study provides further empirical support for this notion but extends previous findings in several ways. First, ecologically valid materials were used. Second, the effect of causal information on hindsight distortions was investigated in the realm of previously known events. Third, cross-cultural differences in reasoning (analytic vs. holistic) were taken into account. Specifically, German and Vietnamese participants in our study were presented with Wikipedia articles about the nuclear power plant in Fukushima Daiichi, Japan. They read either the version that existed before the nuclear disaster unfolded (Version 1) or the article that existed 8 weeks after the catastrophe commenced (Version 2). Only the latter contained elaborations on causal antecedents and therefore provided an explanation for the disaster. Reading that version led participants to perceive the nuclear disaster to be more likely inevitable and foreseeable when compared to reading Version 1. Cultural background did not exert a significant effect on these perceptions. Hence, hindsight distortions were obtained for ecologically valid materials even if the event was already known. Implications and directions for future research are discussed.||0||0|
|Research on XML data mining model based on multi-level technology||Zhu J.-X.||Data mining model
World Wide Web
|Advanced Materials Research||English||The era of Web 2.0 has been coming, and more and more Web 2.0 application, such social networks and Wikipedia, have come up. As an industrial standard of the Web 2.0, the XML technique has also attracted more and more researchers. However, how to mine value information from massive XML documents is still in its infancy. In this paper, we study the basic problem of XML data mining-XML data mining model. We design a multi-level XML data mining model, propose a multi-level data mining method, and list some research issues in the implementation of XML data mining systems.||0||0|
|Revision graph extraction in Wikipedia based on supergram decomposition and sliding update||Wu J.
|IEICE Transactions on Information and Systems||English||As one of the popular social media that many people turn to in recent years, collaborative encyclopedia Wikipedia provides information in a more "Neutral Point of View" way than others. Towards this core principle, plenty of efforts have been put into collaborative contribution and editing. The trajectories of how such collaboration appears by revisions are valuable for group dynamics and social media research, which suggest that we should extract the underlying derivation relationships among revisions from chronologically-sorted revision history in a precise way. In this paper, we propose a revision graph extraction method based on supergram decomposition in the document collection of near-duplicates. The plain text of revisions would be measured by its frequency distribution of supergram, which is the variable-length token sequence that keeps the same through revisions. We show that this method can effectively perform the task than existing methods. Copyright||0||0|
|Revision history: Translation trends in Wikipedia||McDonough Dolmaya J.||Crowdsourced translation
|Translation Studies||English||Wikipedia is a well-known example of a website with content developed entirely through crowdsourcing. It has over 4 million articles in English alone, and content in 284 other language versions. While the articles in the different versions are often written directly in the respective target-language, translations also take place. Given that a previous study suggested that many of English Wikipedia's translators had neither formal training in translation nor professional work experience as translators, it is worth examining the quality of the translations produced. This paper uses Mossop's taxonomy of editing and revising procedures to explore a corpus of translated Wikipedia articles to determine how often transfer and language/style problems are present in these translations and assess how these problems are addressed. © 2014 © 2014 Taylor & Francis.||0||0|
|SCooL: A system for academic institution name normalization||Jacob F.
Name Entity Recognition
|2014 International Conference on Collaboration Technologies and Systems, CTS 2014||English||Named Entity Normalization involves normalizing recognized entities to a concrete, unambiguous real world entity. Within the purview of the online job posting domain, academic institution name normalization provides a beneficial opportunity for CareerBuilder (CB). Accurate and detailed normalization of academic institutions are important to perform sophisticated labor market dynamics analysis. In this paper we present and discuss the design and the implementation of sCooL, an academic institution name normalization system designed to supplant the existing manually maintained mapping system at CB. We also discuss the specific challenges that led to the design of sCooL. sCooL leverages Wikipedia to create academic institution name mappings from a school database which is created from job applicant resumes posted on our website. The mappings created are utilized to build a database which is then used for normalization. sCooL provides the flexibility to integrate mappings collected from different curated and non-curated sources. The system is able to identify malformed data and K-12 schools from universities and colleges. We conduct an extensive comparative evaluation of the semi-automated sCooL system against the existing manual mapping implementation and show that sCooL provides better coverage with improved accuracy.||0||0|
|Scalability of assessments of wiki-based learning experiences in higher education||Manuel Palomo-Duarte
|Computer-supported collaborative learning
|Computers in Human Behavior||English||In recent years, the focus on higher education learning has shifted from knowledge to skills, with interpersonal skills likely being the most difficult to assess and work with. Wikis ease open collaboration among peers. A number of these skills can be objectively assessed by using wikis in an educational environment: collaborative writing, conflict resolution, group management, leadership, etc. However, when the number of students increases, their interactions usually increase at a higher rate. Under these circumstances, traditional assessment procedures suffer from scalability problems: manually evaluating in detail the information stored in a wiki to retrieve objective metrics becomes a complex and time-consuming task. Thus, automated tools are required to support the assessment of such processes. In this paper we compare seven case studies conducted in Computer Science courses of two Spanish universities: Cádiz and Seville. We comment on their different settings: durations, milestones, contribution sizes, weights in the final grade and, most importantly, their assessment methods. We discuss and compare the different methodologies and tools used to assess the desired skills in the context of each case study. © 2013 Elsevier Ltd. All rights reserved.||0||0|
|Self-sorting map: An efficient algorithm for presenting multimedia data in structured layouts||Strong G.
Artificial neural networks
Computational and artificial intelligence
Computers and information processing
Systems man and cybernetics
|IEEE Transactions on Multimedia||English||This paper presents the Self-Sorting Map (SSM), a novel algorithm for organizing and presenting multimedia data. Given a set of data items and a dissimilarity measure between each pair of them, the SSM places each item into a unique cell of a structured layout, where the most related items are placed together and the unrelated ones are spread apart. The algorithm integrates ideas from dimension reduction, sorting, and data clustering algorithms. Instead of solving the continuous optimization problem that other dimension reduction approaches do, the SSM transforms it into a discrete labeling problem. As a result, it can organize a set of data into a structured layout without overlap, providing a simple and intuitive presentation. The algorithm is designed for sorting all data items in parallel, making it possible to arrange millions of items in seconds. Experiments on different types of data demonstrate the SSM's versatility in a variety of applications, ranging from positioning city names by proximities to presenting images according to visual similarities, to visualizing semantic relatedness between Wikipedia articles.||0||0|
|Semantic full-text search with broccoli||Holger Bast
|English||We combine search in triple stores with full-text search into what we call semantic full-text search. We provide a fully functional web application that allows the incremental construction of complex queries on the English Wikipedia combined with the facts from Freebase. The user is guided by context-sensitive suggestions of matching words, instances, classes, and relations after each keystroke. We also provide a powerful API, which may be used for research tasks or as a back end, e.g., for a question answering system. Our web application and public API are available under http://broccoli.cs.uni-freiburg.de.||0||0|
|Semantic question answering using Wikipedia categories clustering||Stratogiannis G.
Wikipedia category vector representation
|International Journal on Artificial Intelligence Tools||English||We describe a system that performs semantic Question Answering based on the combination of classic Information Retrieval methods with semantic ones. First, we use a search engine to gather web pages and then apply a noun phrase extractor to extract all the candidate answer entities from them. Candidate entities are ranked using a linear combination of two IR measures to pick the most relevant ones. For each one of the top ranked candidate entities we find the corresponding Wikipedia page. We then propose a novel way to exploit Semantic Information contained in the structure of Wikipedia. A vector is built for every entity from Wikipedia category names by splitting and lemmatizing the words that form them. These vectors maintain Semantic Information in the sense that we are given the ability to measure semantic closeness between the entities. Based on this, we apply an intelligent clustering method to the candidate entities and show that candidate entities in the biggest cluster are the most semantically related to the ideal answers to the query. Results on the topics of the TREC 2009 Related Entity Finding task dataset show promising performance.||0||0|
|Semi-automatic construction of plane geometry ontology based-on WordNet and Wikipedia||Fu H.-G.
|Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China||Chinese||Ontology, as a member of the Semantic Web's hierarchical structure, is located in the central position. Regarding the current research situation of ontology construction, the manual construction is difficult to ensure its efficiency and scalability; and the automatic construction is hard to guarantee its interoperability. This paper presents a semi-automatic domain ontology construction method based on WordNet and Wikipedia. First, we construct the top-level ontology and then reuse WordNet structure to expand the terminology and terminology-level at the depth of the ontology. Furthermore, we expand the relationship and supplement the terminology at the width of the ontology by referring to page information of Wikipedia. Finally, this method of ontology construction is applied in elementary geometry domain. The experiments show that this method can greatly improve the efficiency of ontology construction and ensure the quality of the ontology to some degree.||0||0|
|Sentence similarity by combining explicit semantic analysis and overlapping n-grams||Vu H.H.
|Lecture Notes in Computer Science||English||We propose a similarity measure between sentences which combines a knowledge-based measure, that is a lighter version of ESA (Explicit Semantic Analysis), and a distributional measure, Rouge. We used this hybrid measure with two French domain-orientated corpora collected from the Web and we compared its similarity scores to those of human judges. In both domains, ESA and Rouge perform better when they are mixed than they do individually. Besides, using the whole Wikipedia base in ESA did not prove necessary since the best results were obtained with a low number of well selected concepts.||0||0|
|Shades: Expediting Kademlia's lookup process||Einziger G.
|Lecture Notes in Computer Science||English||Kademlia is considered to be one of the most effective key based routing protocols. It is nowadays implemented in many file sharing peer-to-peer networks such as BitTorrent, KAD, and Gnutella. This paper introduces Shades, a combined routing/caching scheme that significantly shortens the average lookup process in Kademlia and improves its load handling. The paper also includes an extensive performance study demonstrating the benefits of Shades and compares it to other suggested alternatives using both synthetic workloads and traces from YouTube and Wikipedia.||0||0|
|Shrinking digital gap through automatic generation of WordNet for Indian languages||Jain A.
|AI & SOCIETY||English||Hindi ranks fourth in terms of speaker's size in the world. In spite of that, it has <0.1 % presence on web due to lack of competent lexical resources, a key reason behind digital gap due to language barrier among Indian masses. In the footsteps of the renowned lexical resource English WordNet, 18 Indian languages initiated building WordNets under the project Indo WordNet. India is a multilingual country with around 122 languages and 234 mother tongues. Many Indian languages still do not have any reliable lexical resource, and the coverage of numerous WordNets under progress is still far from average value of 25,792. The tedious manual process and high cost are major reasons behind unsatisfactory coverage and limping progress. In this paper, we discuss the socio-cultural and economic impact of providing Internet accessibility and present an approach for the automatic generation of WordNets to tackle the lack of competent lexical resources. Problems such as accuracy, association of linguistics specific gloss/example and incorrect back-translations which arise while deviating from traditional approach of compilation by lexicographers are resolved by utilising Wikipedia available for Indian languages. © 2014 Springer-Verlag London.||0||0|
|Situated Interaction in a Multilingual Spoken Information Access Framework||Niklas Laxström
|SkWiki: A multimedia sketching system for collaborative creativity||Zhao Z.
|Conference on Human Factors in Computing Systems - Proceedings||English||We present skWiki, a web application framework for collaborative creativity in digital multimedia projects, including text, hand-drawn sketches, and photographs. skWiki overcomes common drawbacks of existing wiki software by providing a rich viewer/editor architecture for all media types that is integrated into the web browser itself, thus avoiding dependence on client-side editors. Instead of files, skWiki uses the concept of paths as trajectories of persistent state over time. This model has intrinsic support for collaborative editing, including cloning, branching, and merging paths edited by multiple contributors. We demonstrate skWiki's utility using a qualitative, sketching-based user study.||0||0|
|Snuggle: Designing for efficient socialization and ideological critique||Aaron Halfaker
|H.5.2. Information Interfaces and Presentation: Graphical user interfaces (GUI)||Conference on Human Factors in Computing Systems - Proceedings||English||Wikipedia, the encyclopedia "anyone can edit", has become increasingly less so. Recent academic research and popular discourse illustrates the often aggressive ways newcomers are treated by veteran Wikipedians. These are complex sociotechnical issues, bound up in infrastructures based on problematic ideologies. In response, we worked with a coalition of Wikipedians to design, develop, and deploy Snuggle, a new user interface that served two critical functions: Making the work of newcomer socialization more effective, and bringing visibility to instances in which Wikipedians current practice of gatekeeping socialization breaks down. Snuggle supports positive socialization by helping mentors quickly find newcomers whose good-faith mistakes were reverted as damage. Snuggle also supports ideological critique and reflection by bringing visibility to the consequences of viewing newcomers through a lens of suspiciousness.||0||0|
|Social software in new product development - State of research and future research directions||Rohmann S.
New product development
|20th Americas Conference on Information Systems, AMCIS 2014||English||Product development becomes increasingly collaborative and knowledge-intensive in today's industry. To gain competitive advantage an effective usage of information systems in new product development (NPD) is needed. Social software applications indicate further potential for usage in NPD, the so called "Product Development 2.0", which is poorly understood in research so far. The purpose of this article is to point out the current state of research in this area by means of a literature review, after which research gaps and future research directions are identified. The results indicate that social software applications are suitable to support tasks in all phases of the NPD process, but influencing factors and effects of the identified social software usage in NPD are poorly understood so far.||0||0|
|Socio-technical systems theory as a diagnostic tool for examining underutilization of wiki technology||Hester A.J.||Alignment
Socio-technical systems theory
|Learning Organization||English||Purpose: This paper aims to examine organizational information systems based on Web 2.0 technology as socio-technical systems that involve interacting relationships among actors, structure, tasks and technology. Alignment within the relationships may facilitate increased technology use; however, gaps in alignment may impede technology use and result in poor performance or system failure. The technology examined is an organizational wiki used for collaborative knowledge management. Design/methodology/approach: Results of a survey administered to employees of an organization providing cloud computing services are presented. The research model depicts the socio-technical component relationships and their influence on use of the wiki. Hierarchical latent variable modelling is used to operationalize the six main constructs. Hypotheses propose that as alignment of a relationship increases, wiki use increases. The partial least squares (PLS) method is used to examine the hypotheses. Findings: Based on the results, increased perceptions of alignment among technology and structure increase wiki use. Further analysis indicates that low usage may be linked to gaps in alignment. Many respondents with lower usage scores also indicated "low alignment" among actor-task, actor-technology, and task-structure. Research limitations/implications: The sample size is rather small; however, results may give an indication as to the appropriateness of dimensions chosen to represent the alignment relationships. Socio-technical systems theory (STS) is often utilized in qualitative studies. This paper introduces a measurement instrument designed to evaluate STS through quantitative analysis. Practical implications: User acceptance and change management continue to be important topics for both researchers and practitioners. The model proposed here provides measures that may reveal predictive indicators for increased information system use. Alternatively, practitioners may be able to utilize a diagnostic tool as presented here to assess underlying factors that may be impeding effective technology utilization. Originality/value: The paper presents a diagnostic tool that may help management to better uncover misaligned relationships leading to underutilization of technology. Practical advice and guidelines are provided allowing for a plan to rectify the situation and improve technology usage and performance outcomes.||0||0|
|Sticky wikis||Berghel H.||Crowdsource
History of computing
|Computer||English||After observing and developing online reference websites for 20 plus years, it's clear the biggest hurdle to reliability still hasn't been overcome.||0||0|
|Students experiences of using Wiki spaces to support collaborative learning in a blended classroom: A case of Kenyatta and KCA universities in Kenya||Gitonga R.
|Collaborative knowledge building
|2014 IST-Africa Conference and Exhibition, IST-Africa 2014||English||Wiki spaces are simply web pages that allow users to create, edit and share each other's work. This paper shares experiences from a group of students who were using the Wiki spaces in their course work. It attempts to use collaborative knowledge building theory to evaluate the existing Wiki spaces practices in order to inform stakeholders on the power of Wiki spaces in setting students on a knowledge building trajectory. The respondents were 150 university students from Kenyatta and KCA universities in Kenya whose lecturers had created Wiki spaces for collaborative group tasks as part of their coursework during the September to December 2013 semester. More than 50% of the students found the Wiki spaces promoting the various aspects of knowledge building such as reflective learning and propagating idea diversity to be useful. This paper underscores the importance of Wiki spaces as environments for positioning today's students on a knowledge building track which is a skill set requirement for the 21st century graduate.||0||0|
|Students' engagement with a collaborative wiki tool predicts enhanced written exam performance||Stafford T.
Interactive learning environments
|Research in Learning Technology||English||We introduced voluntary wiki-based exercises to a long-running cognitive psychology course, part of the core curriculum for an undergraduate degree in psychology. Over 2 yearly cohorts, students who used the wiki more also scored higher on the final written exam. Using regression analysis, it is possible to account for students' tendency to score well on other psychology exams, thus statistically removing some obvious candidate third factors, such as general talent or enthusiasm for psychology, which might drive this correlation. Such an analysis shows that both high- and low-grading studentswho used the wiki got higher scores on the final exam, with engaged wiki users scoring an average of an extra 5 percentage points. We offer an interpretation of the mechanisms of action in terms of the psychological literature on learning and memory. © 2014 T. Stafford et al.||0||0|
|Supply chains under strain||Harris S.||Engineering and Technology||English||The article discusses how to tackle the impact of climate change on supply chain risk. Businesses and governments need to start planning for a world with a changed climate. In particular, industries dependent on food, water, energy or ecosystem services need to scrutinize the resilience and viability of their supply chains. The researchers' vision is for the website to eventually host data that cover hundreds of industrial sectors across individual states, provinces and cities, allowing users to track the flows of specific goods at a scale appropriate for the effects of natural disasters. For example, users could find out exactly how many batteries are shipped from Osaka to California, or investigate the impact of a flood in Bangalore on particular industries worldwide. This resource will soon serve as the 'Wikipedia' for supply chain information, and with this he intends to illustrate the potential impact of climate change to politicians and global businesses.||0||0|
|Supporting navigation in Wikipedia by information visualization: Extended evaluation measures||Wu I.-C.
Information visualization tools
|Journal of Documentation||English||Purpose: The authors introduce two semantics-based navigation applications that facilitate information-seeking activities in internal link-based web sites in Wikipedia. These applications aim to help users find concepts within a topic and related articles on a given topic quickly and then gain topical knowledge from internal link-based encyclopedia web sites. The paper aims to discuss these issues. Design/methodology/approach: The WNavis application consists of three information visualization (IV) tools which are a topic network, a hierarchy topic tree and summaries for topics. The WikiMap application consists of a topic network. The goal of the topic network and topic tree tools is to help users to find the major concepts of a topic and identify relationships between these major concepts easily. In addition, in order to locate specific information and enable users to explore and read topic-related articles quickly, the topic tree and summaries for topics tools support users to gain topical knowledge quickly. The authors then apply the k-clique of cohesive indicator to analyze the sub topics of the seed query and find out the best clustering results via the cosine measure. The authors utilize four metrics, which are correctness, time cost, usage behaviors, and satisfaction, to evaluate the three interfaces. These metrics measure both the outputs and outcomes of applications. As a baseline system for evaluation the authors used a traditional Wikipedia interface. For the evaluation, the authors used an experimental user study with 30 participants. Findings: The results indicate that both WikiMap and WNavis supported users to identify concepts and their relations better compared to the baseline. In topical tasks WNavis over performed both WikiMap and the baseline system. Although there were no time differences in finding concepts or answering topical questions, the test systems provided users with a greater gain per time unit. The users of WNavis leaned on the hierarchy tree instead of other tools, whereas WikiMap users used the topic map. Research limitations/implications: The findings have implications for the design of IR support tools in knowledge-intensive web sites that help users to explore topics and concepts. Originality/value: The authors explored to what extent the use of each IV support tool contributed to successful exploration of topics in search tasks. The authors propose extended task-based evaluation measures to understand how each application provides useful context for users to accomplish the tasks and attain the search goals. That is, the authors not only evaluate the output of the search results, e.g. the number of relevant items retrieved, but also the outcome provided by the system for assisting users to attain the search goal.||0||0|
|Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools||Lopuszynski M.
|Natural Language Processing
Tagging document collections
|Communications in Computer and Information Science||English||In this work, we compare two simple methods of tagging scientific publications with labels reflecting their content. As a first source of labels Wikipedia is employed, second label set is constructed from the noun phrases occurring in the analyzed corpus. We examine the statistical properties and the effectiveness of both approaches on the dataset consisting of abstracts from 0.7 million of scientific documents deposited in the ArXiv preprint collection. We believe that obtained tags can be later on applied as useful document features in various machine learning tasks (document similarity, clustering, topic modelling, etc.).||0||0|
|Taking the guesswork out of assessing individual contributions to group work assignments||Lambert S.C.
|Issues in Accounting Education||English||This paper demonstrates how a wiki can be used to deliver greater justice in the form of a fairer grade to students who report that not all members of their group made a reasonable contribution to an assignment. While group assessment has many pedagogical and professional benefits, it is fraught with potentially unjust outcomes in terms of the marks assigned to individual students. Free-riders can unjustly receive marks for work that they have not contributed to and they may even drag down the group marks due to their non-performance. We describe how a wiki was used in an auditing group assignment to provide evidence of individual student contributions following reports of unequal contributions by group members. It was found that the wiki provided a relatively more objective basis than traditional document-based assignments to inform the decision as to whether or not all students in the group would receive the same grade and if not, how the grades should be modified.||0||0|
|Term impact-based web page ranking||Al-Akashi F.H.
Vector space model
|ACM International Conference Proceeding Series||English||Indexing Web pages based on content is a crucial step in a modern search engine. A variety of methods and approaches exist to support web page rankings. In this paper, we describe a new approach for obtaining measures for Web page ranking. Unlike other recent approaches, it exploits the meta-terms extracted from the titles and urls for indexing the contents of web documents. We use the term impact to correlate each meta-term with document's content, rather than term frequency and other similar techniques. Our approach also uses the structural knowledge available in Wikipedia for making better expansion and formulation for the queries. Evaluation with automatic metrics provided by TREC reveals that our approach is effective for building the index and for retrieval. We present retrieval results from the ClueWeb collection, for a set of test queries, for two tasks: for an adhoc retrieval task and for a diversity task (which aims at retrieving relevant pages that cover different aspects of the queries).||0||0|
|Text summarization using Wikipedia||Sankarasubramaniam Y.
|Information Processing and Management||English||Automatic text summarization has been an active field of research for many years. Several approaches have been proposed, ranging from simple position and word-frequency methods, to learning and graph based algorithms. The advent of human-generated knowledge bases like Wikipedia offer a further possibility in text summarization - they can be used to understand the input text in terms of salient concepts from the knowledge base. In this paper, we study a novel approach that leverages Wikipedia in conjunction with graph-based ranking. Our approach is to first construct a bipartite sentence-concept graph, and then rank the input sentences using iterative updates on this graph. We consider several models for the bipartite graph, and derive convergence properties under each model. Then, we take up personalized and query-focused summarization, where the sentence ranks additionally depend on user interests and queries, respectively. Finally, we present a Wikipedia-based multi-document summarization algorithm. An important feature of the proposed algorithms is that they enable real-time incremental summarization - users can first view an initial summary, and then request additional content if interested. We evaluate the performance of our proposed summarizer using the ROUGE metric, and the results show that leveraging Wikipedia can significantly improve summary quality. We also present results from a user study, which suggests that using incremental summarization can help in better understanding news articles. © 2014 Elsevier Ltd. All rights reserved.||0||0|
|The anyone-can-edit syndrome: Intercreation stories of three featured articles on wikipedia||Mattus M.||Collaboration
|Nordicom Review||English||The user-generated wiki encyclopedia Wikipedia was launched in January 2001 by Jimmy Wales and Larry Sanger. Wikipedia has become the world’s largest wiki encyclopedia, and behind many of its entries are interesting stories of creation, or rather intercreation, since Wikipedia is produced by a large number of contributors. Using the slogan “the free encyclopedia that anyone can edit” (Wikipedia 2013), Wikipedia invites everyone to participate, but the participants do not necessarily represent all kinds of individuals or interests – there might be an imbalance affecting the content as well as the perspective conveyed. As a phenomenon Wikipedia is quite complex, and can be studied from many different angels, for instance through the articles’ history and the edits to them. This paper is based on a study of Featured Articles from the Swedish Wikipedia. Three articles, Fri vilja [Free will], Fjäll [Fell], and Edgar Allan Poe, are chosen from a list of Featured Articles that belongs to the subject field culture. The articles’ development has been followed from their very first versions in 2003/2004 to edits made at the end of 2012. The aim is to examine the creation, or intercreation, processes of the articles, and the collaborative production. The data come from non-article material such as revision history pages, article material, and some complementary statistics. Principally the study has a qualitative approach, but with some quantitative elements.||0||0|
|The business and politics of search engines: A comparative study of Baidu and Google's search results of Internet events in China||Jiang M.||Baidu
|New Media and Society||English||Despite growing interest in search engines in China, relatively few empirical studies have examined their sociopolitical implications. This study fills several research gaps by comparing query results (N = 6320) from China's two leading search engines, Baidu and Google, focusing on accessibility, overlap, ranking, and bias patterns. Analysis of query results of 316 popular Chinese Internet events reveals the following: (1) after Google moved its servers from Mainland China to Hong Kong, its results are equally if not more likely to be inaccessible than Baidu's, and Baidu's filtering is much subtler than the Great Firewall's wholesale blocking of Google's results; (2) there is low overlap (6.8%) and little ranking similarity between Baidu's and Google's results, implying different search engines, different results and different social realities; and (3) Baidu rarely links to its competitors Hudong Baike or Chinese Wikipedia, while their presence in Google's results is much more prominent, raising search bias concerns. These results suggest search engines can be architecturally altered to serve political regimes, arbitrary in rendering social realities and biased toward self-interest.||0||0|
|The completeness of articles and citation in the Slovene Wikipedia||Noc M.
|Program||English||Purpose: The purpose of this research was to examine the number and type of sources cited by featured articles from the Slovene Wikipedia with the purpose of assessing their quality. A sample of random articles was also procured in order to give a clearer picture of the content of the Slovene Wikipedia. Design/methodology/approach: A research was conducted on 122 featured articles from the Slovene Wikipedia from 2009, 2010 and 2011. The following aspects of the articles were analyzed: topic and originality of the article and number, language and type of sources cited. Findings: The results have shown that most of the featured articles are adapted from the English Wikipedia, the most common topics being science, sports and history. Based on these results the authors have concluded that despite some deficiencies the featured articles on the Slovene Wikipedia are of much higher quality compared to random articles. Research limitations/implications: The biggest research limitation is the ever changing nature of Wikipedia and its articles, which hinders the process of analyzing results and relying on these results to be relevant in the future. Originality/value: This is the first such research of the Slovene Wikipedia that deals specifically with citation analysis of featured articles. Results of this research offer valuable information to both editors of featured articles and users, as they point out certain deficiencies, which can be eliminated.||0||0|
|The development of an expert system for arid rangeland management in central Namibia with emphasis on bush thickening||Joubert D.
Decision support system
|African Journal of Range and Forage Science||English||An online decision support system derived from research and expert knowledge was developed for arid rangeland management in central Namibia. The expert system emphasises the control of bush thickening and is divided into three forms of decisions: adaptive, reactive and ongoing good management. Adaptive decisions are mostly related to periods of protracted high rainfall, as this is a critical window both in terms of a hazard (transition to bushy thickened state if no fire is applied) and opportunity (transition back to an open savanna if fire is applied). Currently, the expert system uses wiki technology, as this allows a high level of interaction between user and administrator. The expert system includes embedded links to photographs and additional information. It allows easy updating of the knowledge base. An additional booklet was also developed, since access to computers and the internet is still limited. Although the evaluation of the expert system will be determined partly by its acceptance by rangeland managers, we adhered to many of the critical success factors relating to decision support systems in its development. The paper discusses some key strategies for the success of this and similar decision support systems for improved rangeland management in the future.||0||0|
|The economics of contribution in a large enterprise-scale wiki||Paul C.L.
|English||The goal of our research was to understand how knowledge workers use community-curated knowledge and collaboration tools in a large organization. In our study, we explored wiki use among knowledge workers in their day-to-day responsibilities. In this poster, we examine the motivation and rewards for knowledge workers to participate in wikis through the economic idea of costs to contribute.||0||0|
|The impact of identity on anxiety during wiki editing in higher education||Cowan B.R.
|Journal of Enterprise Information Management||English||Purpose: Although wikis are common in higher education, little is known about the wiki user experience in these contexts and how system characteristics impact such experiences. The purpose of this paper is to explore experimentally the hypothesis that changing the anonymity of identity when editing wikis will impact significantly on user editing anxiety and that this may be dependent on the type of edit being conducted. Design/methodology/approach: This hypothesis was explored using a controlled experiment study whereby users were given excerpts to include in their own words on a wiki site used for a psychology course. Users edited the wiki anonymously, using a pseudonym relevant to the context (a matriculation number) and using a full named identity. Users were also either asked to add content to the wiki or to delete and replace content on the wiki site. Findings: The paper found that users experienced significantly less anxiety when editing anonymously compared to when editing with a pseudonym or full name and that the type of edit being conducted did not impact the anxiety felt. Originality/value: The research highlights that the effects of anonymity discussed are also in operation in a wiki context, a more fundamentally anonymous context compared to blogs, bulletin boards or general computer-mediated communication tools.||0||0|
|The impact of semantic document expansion on cluster-based fusion for microblog search||Liang S.
Maarten de Rijke
|Lecture Notes in Computer Science||English||Searching microblog posts, with their limited length and creative language usage, is challenging. We frame the microblog search problem as a data fusion problem. We examine the effectiveness of a recent cluster-based fusion method on the task of retrieving microblog posts. We find that in the optimal setting the contribution of the clustering information is very limited, which we hypothesize to be due to the limited length of microblog posts. To increase the contribution of the clustering information in cluster-based fusion, we integrate semantic document expansion as a preprocessing step. We enrich the content of microblog posts appearing in the lists to be fused by Wikipedia articles, based on which clusters are created. We verify the effectiveness of our combined document expansion plus fusion method by making comparisons with microblog search algorithms and other fusion methods.||0||0|
|The last click: Why users give up information network navigation||Scaria A.T.
|WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining||English||An important part of finding information online involves clicking from page to page until an information need is fully satisfied. This is a complex task that can easily be frustrating and force users to give up prematurely. An empirical analysis of what makes users abandon click-based navigation tasks is hard, since most passively collected browsing logs do not specify the exact target page that a user was trying to reach. We propose to overcome this problem by using data collected via Wikispeedia, a Wikipedia-based human-computation game, in which users are asked to navigate from a start page to an explicitly given target page (both Wikipedia articles) by only tracing hyperlinks between Wikipedia articles. Our contributions are two-fold. First, by analyzing the differences between successful and abandoned navigation paths, we aim to understand what types of behavior are indicative of users giving up their navigation task. We also investigate how users make use of back clicks during their navigation. We find that users prefer backtracking to high-degree nodes that serve as landmarks and hubs for exploring the network of pages. Second, based on our analysis, we build statistical models for predicting whether a user will finish or abandon a navigation task, and if the next action will be a back click. Being able to predict these events is important as it can potentially help us design more human-friendly browsing interfaces and retain users who would otherwise have given up navigating a website.||0||0|
|The reasons why people continue editing Wikipedia content - task value confirmation perspective||Lai C.-Y.
|Behaviour and Information Technology||English||Recently, Wikipedia has garnered increasing public attention. However, few studies have examined the intentions of individuals who edit Wikipedia content. Furthermore, previous studies ascribed a 'knowledge sharing' label to Wikipedia content editors. However, in this work, Wikipedia can be viewed as a platform that allows individuals to show their expertise. This study investigates the underlying reasons that drive individuals to edit Wikipedia content. Based on expectation-confirmation theory and expectancy-value theory for achievement motivations, we propose an integrated model that incorporates psychological and contextual perspectives. Wikipedians from the English-language Wikipedia site were invited to survey. Partial least square was applied to test our proposed model. Analytical results indicated and confirmed that subjective task value, commitment, and procedural justice were significant to satisfaction of Wikipedians; and satisfaction significantly influenced continuance intention to edit Wikipedia content. © 2014 © 2014 Taylor & Francis.||0||0|
|Tibetan-Chinese named entity extraction based on comparable corpus||Sun Y.
Tibetan-Chinese named entity
|Applied Mechanics and Materials||English||Tibetan-Chinese named entity extraction is the foundation of Tibetan-Chinese information processing, which provides the basis for machine translation and cross-language information retrieval research. We used the multi-language links of Wikipedia to obtain Tibetan-Chinese comparable corpus, and combined sentence length, word matching and entity boundary words together to carry out sentence alignment. Then we extracted Tibetan-Chinese named entity from the aligned comparable corpus in three ways: (1) Natural labeling information extraction. (2) The links of Tibetan entries and Chinese entries extraction. (3) The method of sequence intersection. It contained taking the sentence as words sequence, recognizing Chinese named entity from Chinese sentences and intersecting aligned Tibetan sentences. Fianlly, through the experiment, the results prove the extraction method based on comparable corpus is effective.||0||0|
|Title named entity recognition using wikipedia and abbreviation generation||Park Y.
Conditional random field
Title named entity
|2014 International Conference on Big Data and Smart Computing, BIGCOMP 2014||English||In this paper, we propose a title named entity recognition model using Wikipedia and abbreviation generation. The proposed title named entity recognition model automatically extracts title named entities from Wikipedia so constant renewal is possible without additional costs. Also, in order to establish a dictionary of title named entity abbreviations, generation rules are used to generate abbreviation candidates and abbreviations are selected through web search methods. In this paper, we propose a statistical model that recognizes title named entities using CRFs (Conditional Random Fields). The proposed model uses lexical information, a named entity dictionary, and an abbreviation dictionary, and provides title named entity recognition performance of 82.1% according to experimental results.||0||0|
|Topic modeling approach to named entity linking||Huai B.-X.
|Named entity linking
Probabilistic topic models
|Ruan Jian Xue Bao/Journal of Software||Chinese||Named entity linking (NEL) is an advanced technology which links a given named entity to an unambiguous entity in the knowledge base, and thus plays an important role in a wide range of Internet services, such as online recommender systems and Web search engines. However, with the explosive increasing of online information and applications, traditional solutions of NEL are facing more and more challenges towards linking accuracy due to the large number of online entities. Moreover, the entities are usually associated with different semantic topics (e.g., the entity "Apple" could be either a fruit or a brand) whereas the latent topic distributions of words and entities in same documents should be similar. To address this issue, this paper proposes a novel topic modeling approach to named entity linking. Different from existing works, the new approach provides a comprehensive framework for NEL and can uncover the semantic relationship between documents and named entities. Specifically, it first builds a knowledge base of unambiguous entities with the help of Wikipedia. Then, it proposes a novel bipartite topic model to capture the latent topic distribution between entities and documents. Therefore, given a new named entity, the new approach can link it to the unambiguous entity in the knowledge base by calculating their semantic similarity with respect to latent topics. Finally, the paper conducts extensive experiments on a real-world data set to evaluate our approach for named entity linking. Experimental results clearly show that the proposed approach outperforms other state-of-the-art baselines with a significant margin.||0||0|
|Topic modeling for wikipedia link disambiguation||Skaggs B.
|ACM Transactions on Information Systems||English||Many articles in the online encyclopedia Wikipedia have hyperlinks to ambiguous article titles; these ambiguous links should be replaced with links to unambiguous articles, a process known as disambiguation. We propose a novel statistical topic model based on link text, which we refer to as the Link Text Topic Model (LTTM), that we use to suggest new link targets for ambiguous links. To evaluate our model, we describe a method for extracting ground truth for this link disambiguation task from edits made to Wikipedia in a specific time period. We use this ground truth to demonstrate the superiority of LTTM over other existing link- and content-based approaches to disambiguating links in Wikipedia. Finally, we build a web service that uses LTTM to make suggestions to human editors wanting to fix ambiguous links in Wikipedia.||0||0|
|Topic ontology-based efficient tag recommendation approach for blogs||Subramaniyaswamy V.
|International Journal of Computational Science and Engineering||English||Efficient tag recommendation systems are required to help users in the task of searching, indexing and browsing appropriate blog content. Tag generation has become more popular to annotate web content, other blogs, photos, videos and music. Tag recommendation is an action of signifying valuable and informative tags to a budding item based on the content. We propose a novel approach based on topic ontology for tag recommendation. The proposed approach intelligently generates tag suggestions to blogs. In this paper, we effectively construct the technology entitled Ontology based on Wikipedia categories and WordNet semantic relationship to make the ontology more meaningful and reliable. Spreading activation algorithm is applied to assign interest scores to existing blog content and tags. High quality tags are suggested based on the significance of the interest score. Evaluation proves that the applicability of topic ontology with spreading activation algorithm helps tag recommendation more effective when compared to collaborative tag recommendations. Our proposed approach offers several solutions to tag spamming, sentiment analysis and popularity. Finally, we report the results of an experiment which improves the performance of tag recommendation approach.||0||0|
|Towards automatic building of learning pathways||Siehndel P.
|WEBIST 2014 - Proceedings of the 10th International Conference on Web Information Systems and Technologies||English||Learning material usually has a logical structure, with a beginning and an end, and lectures or sections that build upon one another. However, in informal Web-based learning this may not be the case. In this paper, we present a method for automatically calculating a tentative order in which objects should be learned based on the estimated complexity of their contents. Thus, the proposed method is based on a process that enriches textual objects with links to Wikipedia articles, which are used to calculate a complexity score for each object. We evaluated our method with two different datasets: Wikipedia articles and online learning courses. For Wikipedia data we achieved correlations between the ground truth and the predicted order of up to 0.57 while for subtopics inside the online learning courses we achieved correlations of 0.793.||0||0|
|Towards linking libraries and Wikipedia: Aautomatic subject indexing of library records with Wikipedia concepts||Joorabchi A.
|Journal of Information Science||English||In this article, we first argue the importance and timely need of linking libraries and Wikipedia for improving the quality of their services to information consumers, as such linkage will enrich the quality of Wikipedia articles and at the same time increase the visibility of library resources which are currently overlooked to a large degree. We then describe the development of an automatic system for subject indexing of library metadata records with Wikipedia concepts as an important step towards library-Wikipedia integration. The proposed system is based on first identifying all Wikipedia concepts occurring in the metadata elements of library records. This is then followed by training and deploying generic machine learning algorithms to automatically select those concepts which most accurately reflect the core subjects of the library materials whose records are being indexed. We have assessed the performance of the developed system using standard information retrieval measures of precision, recall and F-score on a dataset consisting of 100 library metadata records manually indexed with a total of 469 Wikipedia concepts. The evaluation results show that the developed system is capable of achieving an averaged F-score as high as 0.92.||0||0|
|Towards twitter user recommendation based on user relations and taxonomical analysis||Slabbekoorn K.
|Frontiers in Artificial Intelligence and Applications||English||Twitter is one of the largest social media platforms in the world. Although Twitter can be used as a tool for getting valuable information related to a topic of interest, it is a hard task for us to find users to follow for this purpose. In this paper, we present a method for Twitter user recommendation based on user relations and taxonomical analysis. This method first finds some users to follow related to the topic of interest by giving keywords representing the topic, then picks up users who continuously provide related tweets from the user list. In the first phase we rank users based on user relations obtained from tweet behaviour of each user such as retweet and mention (reply), and we create topic taxonomies of each user from tweets posted during different time periods in the second phase. Experimental results show that our method is very effective in recommending users who post tweets related to the topic of interest all the time rather than users who post related tweets just temporarily.||0||0|
|Tracking topics on revision graphs of wikipedia edit history||Li B.
|Lecture Notes in Computer Science||English||Wikipedia is known as the largest online encyclopedia, in which articles are constantly contributed and edited by users. Past revisions of articles after edits are also accessible from the public for confirming the edit process. However, the degree of similarity between revisions is very high, making it difficult to generate summaries for these small changes from revision graphs of Wikipedia edit history. In this paper, we propose an approach to give a concise summary to a given scope of revisions, by utilizing supergrams, which are consecutive unchanged term sequences.||0||0|
|Trendspedia: An Internet observatory for analyzing and visualizing the evolving web||Kang W.
|Proceedings - International Conference on Data Engineering||English||The popularity of social media services has been innovating the way of information acquisition in modern society. Meanwhile, mass information is generated in every single day. To extract useful knowledge, much effort has been invested in analyzing social media contents, e.g., (emerging) topic discovery. With these findings, however, users may still find it hard to obtain knowledge of great interest in conformity with their preference. In this paper, we present a novel system which brings proper context to continuously incoming social media contents, such that mass information can be indexed, organized and analyzed around Wikipedia entities. Four data analytics tools are employed in the system. Three of them aim to enrich each Wikipedia entity by analyzing the relevant contents while the other one builds an information network among the most relevant Wikipedia entities. With our system, users can easily pinpoint valuable information and knowledge they are interested in, as well as navigate to other closely related entities through the information network for further exploration.||0||0|
|TripBuilder: A tool for recommending sightseeing tours||Brilhante I.
|Lecture Notes in Computer Science||English||We propose TripBuilder, an user-friendly and interactive system for planning a time-budgeted sightseeing tour of a city on the basis of the points of interest and the patterns of movements of tourists mined from user-contributed data. The knowledge needed to build the recommendation model is entirely extracted in an unsupervised way from two popular collaborative platforms: Wikipedia and Flickr. TripBuilder interacts with the user by means of a friendly Web interface that allows her to easily specify personal interests and time budget. The sightseeing tour proposed can be then explored and modified. We present the main components composing the system.||0||0|
|Trust, but verify: Predicting contribution quality for knowledge base construction and curation||Tan C.H.
Knowledge base construction
Predicting contribution quality
|WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining||English||The largest publicly available knowledge repositories, such as Wikipedia and Freebase, owe their existence and growth to volunteer contributors around the globe. While the majority of contributions are correct, errors can still creep in, due to editors' carelessness, misunderstanding of the schema, malice, or even lack of accepted ground truth. If left undetected, inaccuracies often degrade the experience of users and the performance of applications that rely on these knowledge repositories. We present a new method, CQUAL, for automatically predicting the quality of contributions submitted to a knowledge base. Significantly expanding upon previous work, our method holistically exploits a variety of signals, including the user's domains of expertise as reflected in her prior contribution history, and the historical accuracy rates of different types of facts. In a large-scale human evaluation, our method exhibits precision of 91% at 80% recall. Our model verifies whether a contribution is correct immediately after it is submitted, significantly alleviating the need for post-submission human reviewing.||0||0|
|Twelve years of wikipedia research||Judit Bar-Ilan
|WebSci 2014 - Proceedings of the 2014 ACM Web Science Conference||English||Wikipedia was formally launched in 2001, but the first research papers mentioning it appeared only in 2002. Since then it raised a huge amount of interest in the research community. At first mainly the content creation processes and the quality of the content were studied, but later on it was picked up as a valuable source for data mining and for testing. In this paper we present preliminary results that characterize the research done on and using Wikipedia since 2002. Copyright||0||0|
|Two is bigger (and better) than one: The wikipedia bitaxonomy project||Flati T.
|English||We present WiBi, an approach to the automatic creation of a bitaxonomy for Wikipedia, that is, an integrated taxonomy of Wikipage pages and categories. We leverage the information available in either one of the taxonomies to reinforce the creation of the other taxonomy. Our experiments show higher quality and coverage than state-of-the-art resources like DBpedia, YAGO, MENTA, WikiNet and WikiTaxonomy.||0||0|
|Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty||Mark Graham
|Geographies of knowledge
|Annals of the Association of American Geographers||English||Geographies of codified knowledge have always been characterized by stark core-periphery patterns, with some parts of the world at the center of global voice and representation and many others invisible or unheard. Many have pointed to the potential for radical change, however, as digital divides are bridged and 2.5 billion people are now online. With a focus on Wikipedia, which is one of the world's most visible, most used, and most powerful repositories of user-generated content, we investigate whether we are now seeing fundamentally different patterns of knowledge production. Even though Wikipedia consists of a massive cloud of geographic information about millions of events and places around the globe put together by millions of hours of human labor, the encyclopedia remains characterized by uneven and clustered geographies: There is simply not a lot of content about much of the world. The article then moves to describe the factors that explain these patterns, showing that although just a few conditions can explain much of the variance in geographies of information, some parts of the world remain well below their expected values. These findings indicate that better connectivity is only a necessary but not a sufficient condition for the presence of volunteered geographic information about a place. We conclude by discussing the remaining social, economic, political, regulatory, and infrastructural barriers that continue to disadvantage many of the world's informational peripheries. The article ultimately shows that, despite many hopes that a democratization of connectivity will spur a concomitant democratization of information production, Internet connectivity is not a panacea and can only ever be one part of a broader strategy to deepen the informational layers of places.||0||0|
|Use of an internet website wiki at oncology Advanced Pharmacy Practice Experiences (APPE) and the effects on student confidence with oncology references||Thompson L.A.
|Advanced pharmacy practice experience
|Currents in Pharmacy Teaching and Learning||English||Objective: To describe the impact of utilizing an interactive website (wiki) to direct students to oncology resources during Oncology Advanced Pharmacy Practice Experiences (APPEs) on student ease of use and confidence with utilizing oncology references, and student-preferred initial search strategies. Methods: We surveyed students who completed an Oncology APPE at University of Colorado Hospital/Cancer Center (UCH/UCCC) before (Control) and after implementation of the wiki (Intervention). The questionnaire included questions regarding student confidence with oncology and general drug-information (DI) resources, ease of finding information, and student-preferred initial search strategies. Results: Of the 33 students completing an Oncology APPE at UCH/UCCC, 26 responded (response rate 78.8%). Before the APPE, fewer students felt somewhat or very confident researching oncology DI questions compared to general DI questions (Control: 16.7% vs 58.3%; Intervention: 0% vs 71.4%). After the APPE, more students felt somewhat or very confident researching oncology and general DI questions in the Control (75% and 83.3%) and Intervention (85.7% for both) groups (p > 0.05 between groups). Student-preferred initial search strategies were similar between groups (p > 0.05). Students in the Intervention group reported greater ease using oncology resources than students in the Control group when answering treatment guideline (4.64 vs 3.92) and supportive care questions (4.21 vs 3.92), although this was not statistically significant (p > 0.05). Conclusions: The wiki was received positively by the students and did not adversely impact utilization of oncology resources. These results may provide guidance to oncology and other specialty APPE preceptors regarding use of a wiki to direct students to specialty resources.||0||0|
|Use of moodle as a tool for collaborative learning: A study focused on wiki||Sonego A.H.S.
Do Amaral E.M.H.
Virtual learning environment
|Revista Iberoamericana de Tecnologias del Aprendizaje||English||This paper aims to evaluate the Wiki tool on Moodle, according to student performance. Research was conducted with students who were using the virtual learning environment, in which they had engaged in the construction of a collaborative text, showing the importance/consequences of the use of information technologies and communication, in small and medium enterprises in the region of Sant'Ana do Livramento city. Wiki provides collaborative learning and building of concepts, thus creating a final result with the help of author and coauthors that lead to dialogue and interaction among students. © 2014 IEEE. Personal use is permitted.||0||0|
|User interest profile identification using Wikipedia knowledge database||Hua Li
URL decay model
Web page Classification
Wikipedia knowledge network
|Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013||English||The interesting, targeted, relevant advertisement is considered as one of the most honest proceeds for personalizing recommendation. Topic identification is the most important technique for the unstructured web pages. Conventional content classification approaches based on bag of words are difficult to process massive web pages. In this paper, Wikipedia Category Network (WCN) nodes are used to identify a web page topic and estimate user's interest profile. Wikipedia is the largest contents knowledge database and updated dynamically. A basic interest data set is marked for WCN. The topic characterization for each WCN node is generated with the depth and breadth of the interest data set. To reduce the deviation of the breadth, a family generation algorithm is proposed to estimate the generation weight in WCN. Finally, an interest decay model based on URL number is proposed to represent user's interest profile in time period. Experimental results illustrated that the performance of Web page topic identification is significant using WCN with family model, and the profile identification model has a dynamical performance for active users.||0||0|
|User interests identification on Twitter using a hierarchical knowledge base||Kapanipathi P.
|Hierarchical Interest Graph
|Lecture Notes in Computer Science||English||Twitter, due to its massive growth as a social networking platform, has been in focus for the analysis of its user generated content for personalization and recommendation tasks. A common challenge across these tasks is identifying user interests from tweets. Semantic enrichment of Twitter posts, to determine user interests, has been an active area of research in the recent past. These approaches typically use available public knowledge-bases (such as Wikipedia) to spot entities and create entity-based user profiles. However, exploitation of such knowledge-bases to create richer user profiles is yet to be explored. In this work, we leverage hierarchical relationships present in knowledge-bases to infer user interests expressed as a Hierarchical Interest Graph. We argue that the hierarchical semantics of concepts can enhance existing systems to personalize or recommend items based on a varied level of conceptual abstractness. We demonstrate the effectiveness of our approach through a user study which shows an average of approximately eight of the top ten weighted hierarchical interests in the graph being relevant to a user's interests.||0||0|
|Using a mixed research method to evaluate the effectiveness of formative assessment in supporting student teachers' wiki authoring||Ng E.M.W.||Assessment for learning
Mixed research methods
|Computers and Education||English||This study aims to investigate whether for preservice early childhood teachers, integrating assessment for learning (AfL) is a viable pedagogy to improve the quality of their wiki-based projects. A total of 76 student teachers who were in their first year of study at a teacher training institute in Hong Kong participated in the study. The student teachers were required to apply the skills and knowledge they had learned about ICT skills and concepts of ICT in education to create digital learning materials for young children in a wiki environment and to peer assess their projects prior to formal submission using an assessment rubric created by the author. The data were triangulated from the responses collected from a discussion forum, a questionnaire, and focus group meetings. The content and number of comments made in the discussion forum indicated that the student teachers not only actively contributed ideas to their peers but also took their peers' comments seriously. Their comments were mainly related to project design, followed by content, organization, and credibility. The questionnaire findings suggested that although the students felt that feedback from their peers could facilitate their own learning, they valued their teacher's comments the most. Seven students participated in the focus group interviews to substantiate the opinions they gave in the questionnaire. The interviewees believed that even though their peers provided comments from different perspectives, their teacher's comments were the most important because she graded them. It was concluded that integrating AfL from the teacher and peers could improve the quality of wiki projects. © 2014 Elsevier Ltd. All rights reserved.||0||0|
|Using linked data to mine RDF from Wikipedia's tables||Munoz E.
|WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining||English||The tables embedded in Wikipedia articles contain rich, semi-structured encyclopaedic content. However, the cumulative content of these tables cannot be queried against. We thus propose methods to recover the semantics of Wikipedia tables and, in particular, to extract facts from them in the form of RDF triples. Our core method uses an existing Linked Data knowledge-base to find pre-existing relations between entities in Wikipedia tables, suggesting the same relations as holding for other entities in analogous columns on different rows. We find that such an approach extracts RDF triples from Wikipedia's tables at a raw precision of 40%. To improve the raw precision, we define a set of features for extracted triples that are tracked during the extraction phase. Using a manually labelled gold standard, we then test a variety of machine learning methods for classifying correct/incorrect triples. One such method extracts 7.9 million unique and novel RDF triples from over one million Wikipedia tables at an estimated precision of 81.5%.||0||0|
|Using wikis for online group projects: Student and tutor perspectives||Kear K.
Technology acceptance model
|International Review of Research in Open and Distance Learning||English||This paper presents a study of the use of wikis to support online group projects in two courses at the UK Open University. The research aimed to investigate the effectiveness of a wiki in supporting (i) student collaboration and (ii) tutors' marking of the students' collaborative work. The paper uses the main factors previously identified by the technology acceptance model (TAM) as a starting point to examine and discuss the experiences of these two very different user groups: students and tutors. Data was gathered from students via a survey and from tutors via a range of methods. The findings suggest that, when used in tandem with an online forum, the wiki was a valuable tool for groups of students developing a shared resource. As previous studies using the TAM have shown, usefulness and ease of use were both important to students' acceptance of the wiki. However, the use of a wiki in this context was less well-received by tutors, because it led to an increase in their workload in assessing the quality of students' collaborative processes. It was possible to reduce the tutor workload by introducing a greater degree of structure in the students' tasks. We conclude that when introducing collaborative technologies to support assessed group projects, the perceptions and needs of both students and tutors should be carefully considered.||0||0|
|Using wikis to facilitate interaction and collaboration among EFL learners: A social constructivist approach to language teaching||Wang Y.-C.||Communicative competence
Computer-supported collaborative learning (CSCL)
|System||English||The interactive and collaborative nature of wikis offers opportunities for language learning beyond traditional pedagogy. This study examined the use of wikis in an EFL writing classroom. The aim was to explore the extent to which wikis can facilitate collaboration and promote foreign language acquisition through a social constructivist perspective. The instruments used in this study include two online questionnaires, interviews with randomly selected participants and students' reflections on using wikis for collaborative writing. Findings indicate that wikis increase the students' motivation to learn English, enhance their writing confidence and promote their initiatives for social constructivist learning. Most of the students enjoyed performing group tasks in the wiki-meditated environment because they found it to be engaging, challenging and interesting. The results also suggest that collaboration on a wiki in an EFL setting can contribute to both language development and social interaction.||0||0|
|Utilizing semantic Wiki technology for intelligence analysis at the tactical edge||Little E.||Big Data
|Proceedings of SPIE - The International Society for Optical Engineering||English||Challenges exist for intelligence analysts to efficiently and accurately process large amounts of data collected from a myriad of available data sources. These challenges are even more evident for analysts who must operate within small military units at the tactical edge. In such environments, decisions must be made quickly without guaranteed access to the kinds of large-scale data sources available to analysts working at intelligence agencies. Improved technologies must be provided to analysts at the tactical edge to make informed, reliable decisions, since this is often a critical collection point for important intelligence data. To aid tactical edge users, new types of intelligent, automated technology interfaces are required to allow them to rapidly explore information associated with the intersection of hard and soft data fusion, such as multi-INT signals, semantic models, social network data, and natural language processing of text. Abilities to fuse these types of data is paramount to providing decision superiority. For these types of applications, we have developed BLADE. BLADE allows users to dynamically add, delete and link data via a semantic wiki, allowing for improved interaction between different users. Analysts can see information updates in near-real-time due to a common underlying set of semantic models operating within a triple store that allows for updates on related data points from independent users tracking different items (persons, events, locations, organizations, etc.). The wiki can capture pictures, videos and related information. New information added directly to pages is automatically updated in the triple store and its provenance and pedigree is tracked over time, making that data more trustworthy and easily integrated with other users' pages.||0||0|
|Validating and extending semantic knowledge bases using video games with a purpose||Vannella D.
|English||Large-scale knowledge bases are important assets in NLP. Frequently, such resources are constructed through automatic mergers of complementary resources, such as WordNet and Wikipedia. However, manually validating these resources is prohibitively expensive, even when using methods such as crowdsourcing. We propose a cost-effective method of validating and extending knowledge bases using video games with a purpose. Two video games were created to validate conceptconcept and concept-image relations. In experiments comparing with crowdsourcing, we show that video game-based validation consistently leads to higher-quality annotations, even when players are not compensated.||0||0|
|VidWiki: Enabling the crowd to improve the legibility of online educational videos||Cross A.
Massive open online course
|English||Videos are becoming an increasingly popular medium for communicating information, especially for online education. Recent efforts by organizations like Coursera, edX, Udacity and Khan Academy have produced thousands of educational videos with hundreds of millions of views in their attempt to make high quality teaching available to the masses. As a medium, videos are time-consuming to produce and cannot be easily modified after release. As a result, errors or problems with legibility are common. While text-based information platforms like Wikipedia have benefitted enormously from crowdsourced contributions for the creation and improvement of content, the various limitations of video hinder the collaborative editing and improvement of educational videos. To address this issue, we present VidWiki, an online platform that enables students to iteratively improve the presentation quality and content of educational videos. Through the platform, users can improve the legibility of handwriting, correct errors, or translate text in videos by overlaying typeset content such as text, shapes, equations, or images. We conducted a small user study in which 13 novice users annotated and revised Khan Academy videos. Our results suggest that with only a small investment of time on the part of viewers, it may be possible to make meaningful improvements in online educational videos. Copyright||0||0|
|Virtual tools and collaborative working environment in embedded system design||Parkhomenko A.V.
|Analysis of requirements
Project management system
Software and hardware
|Proceedings of 2014 11th International Conference on Remote Engineering and Virtual Instrumentation, REV 2014||English||This paper has explored the existing approaches to the design of embedded systems. It is demonstrated that when creating a control system of moving objects based on microcontrollers, it is reasonable to use a hybrid approach based on the designed circuit boards and prepared specialized platforms. It allows satisfying the requirements of minimizing size, power consumption and at the same time reducing the time and labour content of the system design. The results of the development of architecture, hardware and software of the embedded system for efficient remote control of the moving platforms are presented. It also describes the collaborative working environment in which the project is created.||0||0|
|Virtual tutorials, Wikipedia books, and multimedia-based teaching for blended learning support in a course on algorithms and data structures||Knackmuss J.
|Proceedings of SPIE - The International Society for Optical Engineering||English||The aim of this paper is to describe the benefit and support of virtual tutorials, Wikipedia books and multimedia-based teaching in a course on Algorithms and Data Structures. We describe our work and experiences gained from using virtual tutorials held in Netucate iLinc sessions and the use of various multimedia and animation elements for the support of deeper understanding of the ordinary lectures held in the standard classroom on Algorithms and Data Structures for undergraduate computer sciences students. We will describe the benefits, form, style and contents of those virtual tutorials. Furthermore, we mention the advantage of Wikipedia books to support the blended learning process using modern mobile devices. Finally, we give some first statistical measures of improved student's scores after introducing this new form of teaching support.||0||0|
|Visualizing large-scale human collaboration in Wikipedia||Biuk-Aghai R.P.
Visualization of collaborative processes & applications
|Future Generation Computer Systems||English||Volunteer-driven large-scale human-to-human collaboration has become common in the Web 2.0 era. Wikipedia is one of the foremost examples of such large-scale collaboration, involving millions of authors writing millions of articles on a wide range of subjects. The collaboration on some popular articles numbers hundreds or even thousands of co-authors. We have analyzed the co-authoring across entire Wikipedias in different languages and have found it to follow a geometric distribution in all the language editions we studied. In order to better understand the distribution of co-author counts across different topics, we have aggregated content by category and visualized it in a form resembling a geographic map. The visualizations produced show that there are significant differences of co-author counts across different topics in all the Wikipedia language editions we visualized. In this article we describe our analysis and visualization method and present the results of applying our method to the English, German, Chinese, Swedish and Danish Wikipedias. We have evaluated our visualization against textual data and found it to be superior in usability, accuracy, speed and user preference. © 2013 Elsevier B.V. All rights reserved.||0||0|
|Ways of worldmaking in Wikipedia: Reality, legitimacy and collaborative knowledge making||Fullerton L.
Point of view
Social construction of reality
|Media, Culture and Society||English||The on-going social construction of reality, according to Berger and Luckmann's classic treatise, entails both an explanation of the social order which ascribes "cognitive validity to its objectivated meanings" and a justification of that order which provides "a normative dignity to its practical imperatives." The implication is that our knowledge of social reality integrates cognitive facts and normative values to continuously legitimize that reality. We explore this integration of fact and value in an unexpected setting: the "talk pages" of the online encyclopedia Wikipedia in which discussions of article creation are recorded. Our analysis of these discussions draws on Nelson Goodman's Ways of Worldmaking, another classic on the social construction of reality, which catalogues strategies for producing a worldview. We utilize Goodman's theories in four cases of Wikipedia article creation - two histories, "Iraq War" and "Afghanistan War," and two biographies, "George W. Bush" and "Barack Obama" - all of which reveal how knowledge products are created.||0||0|
|Web 2.0 and wiki farms in the business realm: A proposal of new platform for small-sized companies||Zubr V.
|Vision 2020: Sustainable Growth, Economic Development, and Global Competitiveness - Proceedings of the 23rd International Business Information Management Association Conference, IBIMA 2014||English||With the latest generation of Internet development, Web 2.0, the perception of "the Web of webs" has changed. The users are not only the passive "consumers" of the web content created for them, but they themselves are creators of the web pages content. The paper discusses the theoretical foundations for the creation of encyclopaedia. In particular, the wiki farm concept is defined and the advantages as well as disadvantages are mentioned. Employing the questionnaire survey, the opinions and experience with interactive web pages of the representatives of small-sized enterprises are revealed and examined. The final part of the paper includes the proposal of the wiki farm and related services based on the principle of Web 2.0 which might be employed and effectively utilised by wide range of companies. Each offered service contains detailed information, related services, images or visualisations and case studies linked to the particular service. The web content itself is generated consequently by users based on their knowledge and experience.||0||0|
|Community-based Knowledge Engineering
|Knowledge-Based Configuration: From Research to Business Cases||English||Configuration is a thriving application area for AI technologies. As a consequence, there is a need for advancing knowledge acquisition practices in order to make configuration technologies more accessible. In this chapter we introduce W. eeV. is, which is a freely available Wiki-based environment for defining and solving basic configuration tasks. Like Wikipedia, knowledge bases are regarded as Wikis that can be created, edited, and versionized by a community of users. W. eeV. is configurators can be integrated into standard Wikipedia pages and are thus more easily accessible compared to proprietary knowledge representations. W. eeV. is technologies are easily accessible and therefore well suited for the application in educational contexts. © 2014 Elsevier Inc. All rights reserved.||0||0|
|What influences online deliberation? A wikipedia study||Xiao L.
|Journal of the Association for Information Science and Technology||English||In this paper we describe a study aimed at evaluating and improving the quality of online deliberation.We consider the rationales used by participants in deletion discussions on Wikipedia in terms of the literature on democratic and online deliberation and collaborative information quality. Our findings suggest that most participants in these discussions were concerned with the notability and credibility of the topics presented for deletion, and that most presented rationales rooted in established site policies. We found that factors like article topic and unanimity (or lack thereof) were among the factors that tended to affect the outcome of the debate. Our results also suggested that the blackout of the site in response to the proposed Stop Online Piracy Act (SOPA) law affected the decisions of deletion debates that occurred close to the event. We conclude by suggesting implications of this study for broader considerations of online information quality and democratic deliberation.||0||0|
|What makes a good team of Wikipedia editors? A preliminary statistical analysis||Bukowski L.
Statistical data mining
|Lecture Notes in Computer Science||English||The paper concerns studying the quality of teams of Wikipedia authors with statistical approach. We report preparation of a dataset containing numerous behavioural and structural attributes and its subsequent analysis and use to predict team quality. We have performed exploratory analysis using partial regression to remove the influence of attributes not related to the team itself. The analysis confirmed that the key issue significantly influencing article's quality are discussions between teem members. The second part of the paper successfully uses machine learning models to predict good articles based on features of the teams that created them.||0||0|
|Wiki Technology Enhanced Group Project to Promote Active Learning in a Neuroscience Course for First-Year Medical Students: An Exploratory Study||Mi M.
Collaborative project-based learning
|Medical Reference Services Quarterly||English||A wiki group project was integrated into a neuroscience course for first-year medical students. The project was developed as a self-directed, collaborative learning task to help medical students review course content and make clinically important connections. The goals of the project were to enhance students' understanding of key concepts in neuroscience, promote active learning, and reinforce their information literacy skills. The objective of the exploratory study was to provide a formative evaluation of the wiki group project and to examine how wiki technology was utilized to enhance active and collaborative learning of first-year medical students in the course and to reinforce information literacy skills.||0||0|
|Wiki as a collaborative writing tool in teacher education: Evaluation and suggestions for effective use||Hadjerrouit S.||Collaboration
|Computers in Human Behavior||English||Wiki technology provides new opportunities to foster collaborative writing in teacher education. To empirically evaluate the level of collaborative writing in a wiki-based environment, this article used three methods and their combination. The first method was the history function that records all students' actions, enabling to trace all changes made in the wikis. The actions were analyzed in terms of number and percentage of contribution using a taxonomy categorized by 10 editorial types. The second method examined comments posted on the wiki discussion page to evaluate the level of collaboration. The third method provided feedback on the level of collaboration by means of peer assessment. The results show important differences in the types of contributions across the categories investigated. The results also reveal that the level of collaborative writing was lower than expected. Possible factors that may influence wiki-based collaborative writing are discussed. Finally, suggestions for effective use of wikis as collaborative writing tools in teacher education conclude the article. © 2013 Elsevier Ltd. All rights reserved.||0||0|
|Wiki as a knowledge management tool at the Multicultural school of Athens||Kalagiakos P.
Reusability quality assurance group
|IEEE Global Engineering Education Conference, EDUCON||English||The Multicultural school of Athens is a rich source of data and knowledge. Wiki is a part of a collection of software tools aiming to increase community collaboration and provide reusable content within our curriculum. Dependence Pedagogy has been proved a valuable approach and our wiki solution presented here contributes to the establishment of the reusability notion as a prerequisite of a successful Dependence Pedagogy environment.||0||0|
|Wiki based collaborative learning in interuniversity scenarios||Katzlinger E.
|Electronic Journal of e-Learning||English||In business education advanced collaboration skills and media literacy are important for surviving in a globalized business where virtual communication between enterprises is part of the day-by-day business. To transform these global working situations into higher education, a learning scenario between two universities in Germany and Austria was created where students worked together in virtual interregional learning groups. This article reports about a study of an interuniversity collaborative learning scenario within the subject of e-business. Participating students collaborated virtually and documented a shared case study in a Wiki. When working together, learners used different synchronous and asynchronous tools for close virtual collaboration around a Wiki toolset such as forum, chat, video conferencing and other social media. Students applied given case studies (e.g. from Harvard Business Review) or they worked out a business case from their own experience, which covered a range of upcoming e-business topics. In an attending evaluation study with around 460 participants from two universities, 259 questionnaires were evaluated. It reveals several substantive effects like • Tremendous influence of interregional group work for media competencies • Hidden social aspects and conflict potential • Scenario design and different media usage • Teaching effort vs. learning outcome of such a scenario • Learning impact for different student groups depending on gender, employment, graduation or online-moderation. The findings of this study reveal several interesting aspects concerning media usage and show how students benefited from Wiki work in this virtual learning scenario.||0||0|
|Wiki tools in teaching English for Specific (Academic) Purposes - Improving students' participation||Felea C.
English for Specific (Academic) Purposes
|Lecture Notes in Computer Science||English||This study is based on an on-going investigation on the impact of Web 2.0 technologies, namely a wiki-based learning environment, part of a blended approach to teaching English for Specific (Academic) Purposes for EFL undergraduate students in a Romanian university. The research aims to determine whether there are statistically significant differences between the degrees of wiki participation recorded in the first semester of two consecutive academic years, starting from the assumption that modifications in the learning environment, namely the change of location for face-to-face meetings from class to computer lab setting and the introduction of more complex individual page templates may lead to increased wiki participation. Due to the project's multiple dimensions, out of which participation and response to the new online environment are particularly important, the results provide information necessary for further decisions regarding specific instructional design needs and wiki components, and changes affecting the teaching/learning process.||0||0|
|Wiki-mediated collaborative writing in teacher education: Assessing three years of experiences and influencing factors||Hadjerrouit S.||Action category
|CSEDU 2014 - Proceedings of the 6th International Conference on Computer Supported Education||English||Wikis have been reported as tools that promote collaborative writing in educational settings. Examples of wikis in teacher education are group projects, glossary creation, teacher evaluation, and document review. However, in spite of studies that report on successful stories, the claim that wikis support collaborative writing has not yet been firmly confirmed in real educational settings. Most studies are limited to participants' subjective perceptions, and do not take into account influencing factors, or the relationships between wikis and the learning environment. In this paper, students' collaborative writing activities over a period of three years are investigated using a taxonomy of action categories and the wiki data log that tracks all students' actions. The paper analyses the level of contribution of each member of student groups, the types of actions that the groups carried out on the wikis, and the timing of contribution. The article also discusses personal and contextual factors that may influence collaborative writing activities in teacher education, and recommendations for students as well.||0||0|
|WikiNEXT: A wiki for exploiting the web of data||Arapov P.
|Proceedings of the ACM Symposium on Applied Computing||English||This paper presents WikiNEXT, a semantic application wiki. WikiNEXT lies on the border between application wikis and modern web based IDEs (Integrated Development Environments) like jsbin.com, jsfiddle.net, cloud9ide.com, etc. It has been initially created for writing documents that integrate data from external data sources of the web of data, such as DBPedia.org or FreeBase.com, or for writing interactive tutorials (e.g. an HTML5 tutorial, a semantic web programming tutorial) that mix text and interactive examples in the same page. The system combines some powerful aspects from (i) wikis, such as ease of use, collaboration and openness, (ii) semantic web/wikis such as making information processable by machines and (iii) web-based IDEs such as instant development and code testing in a web browser. WikiNEXT is for writing documents/pages as well as for writing web applications that manipulate semantic data, either locally or coming from the web of data. These applications can be created, edited or cloned in the browser and can be used for integrating data visualizations in wiki pages, for annotating content with metadata, or for any kind of processing. WikiNEXT is particularly suited for teaching web technologies or for writing documents that integrate data from the web of data. Copyright 2014 ACM.||0||0|
|WikiReviz: An edit history visualization for wiki systems||Wu J.
|Lecture Notes in Computer Science||English||Wikipedia maintains a linear record of edit history with article content and meta-information for each article, which conceals precious information on how each article has evolved. This demo describes the motivation and features of WikiReviz, a visualization system for analyzing edit history in Wikipedia and other Wiki systems. From the official exported edit history of a single Wikipedia article, WikiReviz reconstructs the derivation relationships among revisions precisely and efficiently by revision graph extraction and indicate meaningful article evolution progress by edit summarization.||0||0|
|WikiTextbooks: Designing Your Course Around a Collaborative Writing Project||Katz B.P.
|PRIMUS||English||We have used wiki technology to support large-scale, collaborative writing projects in which the students build reference texts (called WikiTextbooks). The goal of this paper is to prepare readers to adapt this idea for their own courses. We give examples of the implementation of WikiTextbooks in a variety of courses, including lecture and discovery-based courses. We discuss the kinds of challenges that WikiTextbooks address and focus on critical design decisions. Finally, we conclude with a suggested template wiki project that is approachable for new users and appropriate for many course structures.||0||0|
|WikiWho: Precise and Efficient Attribution of Authorship of Revisioned Content||Fabian Flöck
Community- driven content creation
|World Wide Web Conference 2014||English||Revisioned text content is present in numerous collaboration platforms on the Web, most notably Wikis. To track authorship of text tokens in such systems has many potential applications; the identification of main authors for licensing reasons or tracing collaborative writing patterns over time, to name some. In this context, two main challenges arise. First, it is critical for such an authorship tracking system to be precise in its attributions, to be reliable for further processing. Second, it has to run efficiently even on very large datasets, such as Wikipedia. As a solution, we propose a graph-based model to represent revisioned content and an algorithm over this model that tackles both issues effectively. We describe the optimal implementation and design choices when tuning it to a Wiki environment. We further present a gold standard of 240 tokens from English Wikipedia articles annotated with their origin. This gold standard was created manually and confirmed by multiple independent users of a crowdsourcing platform. It is the first gold standard of this kind and quality and our solution achieves an average of 95% precision on this data set. We also perform a first-ever precision evaluation of the state-of-the-art algorithm for the task, exceeding it by over 10% on average. Our approach outperforms the execution time of the state-of-the-art by one order of magnitude, as we demonstrate on a sample of over 240 English Wikipedia articles. We argue that the increased size of an optional materialization of our results by about 10% compared to the baseline is a favorable trade-off, given the large advantage in runtime performance.||0||0|
|Wikimantic: Toward effective disambiguation and expansion of queries||Boston C.
|Data and Knowledge Engineering||English||This paper presents an implemented and evaluated methodology for disambiguating terms in search queries and for augmenting queries with expansion terms. By exploiting Wikipedia articles and their reference relations, our method is able to disambiguate terms in particularly short queries with few context words and to effectively expand queries for retrieval of short documents such as tweets. Our strategy can determine when a sequence of words should be treated as a single entity rather than as a sequence of individual entities. This work is part of a larger project to retrieve information graphics in response to user queries. © 2013 Elsevier B.V.||0||0|
|Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time||McIver D.J.
|PLoS Computational Biology||English||Circulating levels of both seasonal and pandemic influenza require constant surveillance to ensure the health and safety of the population. While up-to-date information is critical, traditional surveillance systems can have data availability lags of up to two weeks. We introduce a novel method of estimating, in near-real time, the level of influenza-like illness (ILI) in the United States (US) by monitoring the rate of particular Wikipedia article views on a daily basis. We calculated the number of times certain influenza- or health-related Wikipedia articles were accessed each day between December 2007 and August 2013 and compared these data to official ILI activity levels provided by the Centers for Disease Control and Prevention (CDC). We developed a Poisson model that accurately estimates the level of ILI activity in the American population, up to two weeks ahead of the CDC, with an absolute average difference between the two estimates of just 0.27% over 294 weeks of data. Wikipedia-derived ILI models performed well through both abnormally high media coverage events (such as during the 2009 H1N1 pandemic) as well as unusually severe influenza seasons (such as the 2012-2013 influenza season). Wikipedia usage accurately estimated the week of peak ILI activity 17% more often than Google Flu Trends data and was often more accurate in its measure of ILI intensity. With further study, this method could potentially be implemented for continuous monitoring of ILI activity in the US and to provide support for traditional influenza surveillance tools.||0||0|
|Wikipedia in the eyes of its beholders: A systematic review of scholarly research on Wikipedia readers and readership||Chitu Okoli
Finn Årup Nielsen
|Systematic literature review
|Journal of the Association for Information Science and Technology||English||Hundreds of scholarly studies have investigated various aspects of Wikipedia. Although a number of literature reviews have provided overviews of this vast body of research, none has specifically focused on the readers of Wikipedia and issues concerning its readership. In this systematic literature review, we review 99 studies to synthesize current knowledge regarding the readership of Wikipedia and provide an analysis of research methods employed. The scholarly research has found that Wikipedia is popular not only for lighter topics such as entertainment but also for more serious topics such as health and legal information. Scholars, librarians, and students are common users, and Wikipedia provides a unique opportunity for educating students in digital literacy. We conclude with a summary of key findings, implications for researchers, and implications for the Wikipedia community.||0||1|
|Wikipedia-based Kernels for dialogue topic tracking||Soo-Hwan Kim
|Dialogue Topic Tracking
Spoken Dialogue Systems
|ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings||English||Dialogue topic tracking aims to segment on-going dialogues into topically coherent sub-dialogues and predict the topic category for each next segment. This paper proposes a kernel method for dialogue topic tracking to utilize various types of information obtained from Wikipedia. The experimental results show that our proposed approach can significantly improve the performances of the task in mixed-initiative humanhuman dialogues.||0||0|
|Wikipedia-based query performance prediction||Gilad Katz
|English||The query-performance prediction task is to estimate retrieval effectiveness with no relevance judgments. Pre-retrieval prediction methods operate prior to retrieval time. Hence, these predictors are often based on analyzing the query and the corpus upon which retrieval is performed. We propose a corpus-independent approach to preretrieval prediction which relies on information extracted from Wikipedia. Specifically, we present Wikipedia-based features that can attest to the effectiveness of retrieval performed in response to a query regardless of the corpus upon which search is performed. Empirical evaluation demonstrates the merits of our approach. As a case in point, integrating the Wikipedia- based features with state-of-the-art pre-retrieval predictors that analyze the corpus yields prediction quality that is consistently better than that of using the latter alone. Copyright 2014 ACM.||0||0|
|… further results|