List of conference papers

From WikiPapers
Jump to: navigation, search

This is a list of conference papers available in WikiPapers. Currently, there are 4034 conference papers.

Export: BibTeX, CSV, RDF, JSON

To create a new "conference paper" go to Form:Publication.


Conference papers

Title Author(s) Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Similar Gaps, Different Origins? Women Readers and Editors at Greek Wikipedia Ioannis Protonotarios
Vasiliki Sarimpei
Jahna Otterbacher
Gender gap
Wikipedia
Motivation
Participation
Quantitative analysis
Tenth International AAAI Conference on Web and Social Media English 17 May 2016 As a global, multilingual project, Wikipedia could serve as a repository for the world’s knowledge on an astounding range of topics. However, questions of participation and diversity among editors continue to be burning issues. We present the first targeted study of participants at Greek Wikipedia, with the goal of better understanding their motivations. Smaller Wikipedias play a key role in fostering the project’s global character, but typically receive little attention from researchers. We developed two survey instruments, administered in Greek, based on the 2011 Wikipedia Readership and Editors Surveys. Consistent with previous studies, we found a gender gap, with women making up only 38% and 15% of readers and editors, respectively, and with men editors being much more active. Our data suggest two salient explanations: 1) women readers more often lack confidence with respect to their knowledge and technical skills as compared to men, and 2) women’s behaviors may be driven by personal motivations such as enjoyment and learning, rather than by “leaving their mark” on the community, a concern more common among men. Interestingly, while similar proportions of men and women readers use multiple language editions, more women contribute to English Wikipedia in addition to the Greek language community. Future research should consider how this impacts their participation at Greek Wikipedia 11 0
Una comparazione delle reti di ringraziamenti di Wikipedia di alcuni paesi europei Valerio Perticone
Marco Elio Tabacchi
Linguaggio, Cognizione e Società Italian December 2015 Da maggio 2013 l’enciclopedia collaborativa Wikipedia fornisce ad ogni collaboratore la possibilità di esprimere agli altri autori apprezzamento per la creazione o modifica di una specifica voce. Attraverso la funzionalità ringraziamenti l'utente può inviare all'autore un messaggio standard premendo l'apposito pulsante ‘ringrazia’. Il sistema dei ringraziamenti è stato successivamente esteso alle edizioni nelle principali lingue europee. È possibile considerare l’insieme dei ringraziamenti come una rete sociale, rappresentata da un multigrafo in cui gli utenti sono i nodi ed i ringraziamenti archi. Ignorando gli archi multipli ed il verso dell’arco si ha una rete in cui l'esistenza di un ringraziamento stabilisce una relazione, come nei modelli di social network descritti da Boyd e Ellison.

Lo studio della topologia di questa rete può rivelare informazioni sulle relazioni tra i collaboratori, senza dover conoscere in dettaglio le modifiche effettuate dagli utenti e studiare eventuali interazioni pregresse tra i soggetti, come ad es. modifiche effettuate dagli utenti nelle stesse voci, discussioni svolte nelle pagine comunitarie, interessi comuni dichiarati dagli utenti nei relativi profili. Nonostante l'assenza di una esplicita componente sociale nella redazione di una enciclopedia è possibile ipotizzare, a partire dalle modalità di formazione e tenendo conto delle evidenti analogie tra essa e le reti sociali più diffuse, che la rete dei ringraziamenti abbia una topologia small world e scale-free. In letteratura esistono numerosi esempi naturali ed artificiali di reti con tale topologia, che garantisce doti di robustezza e resilienza alla rete. Nelle reti small world, tipiche dei social network di tipo simmetrico, i nodi hanno un alto coefficiente di clustering rispetto ad una rete casuale di pari dimensioni: chi fa parte di una cerchia tende ad essere collegato a molti altri membri. Il cammino medio per andare da un nodo all’altro è inoltre breve rispetto alla dimensione del network (sei gradi di separazione).

Le reti scale-free presentano un alto numero di nodi con pochi collegamenti, e un ristretto numero di nodi (i cd. hub) con moltissimi collegamenti, secondo la distribuzione esponenziale P(x) = x^-α, proprietà verificabile usando un algoritmo basato sul test di Kolmogorov-Smirnov. In questo articolo verificheremo se le doti di robustezza e resilienza e la presenza di hub della rete dei ringraziamenti tipiche delle reti prima descritte sono presenti nelle versioni linguistiche di Wikipedia a maggior diffusione, e commenteremo una particolarità riscontrata su Wikipedia in tedesco.
0 0
Un'analisi preliminare della rete dei ringraziamenti su Wikipedia Valerio Perticone
Marco Elio Tabacchi
Il futuro prossimo della Scienza Cognitiva Italian July 2015 L’enciclopedia gratuita online Wikipedia fornisce ad ogni collaboratore la possibilità di esprimere agli altri autori apprezzamento per la creazione o modifica di una specifica voce attraverso la funzionalità ​ringraziamenti​, implementata dall’aprile 2013 per mezzo del sistema di notifiche ​echo​: ​accanto ad ogni modifica, un link ‘ringrazia’ permette di inviare all’autore un messaggio standard di ringraziamento. L’insieme dei ringraziamenti può essere visto come una rete sociale direzionata, rappresentata da un multigrafo in cui gli utenti sono i nodi ed i ringraziamenti archi orientati. Ignorando gli archi multipli ed il verso dell’arco si ha una rete in cui l'esistenza di un ringraziamento stabilisce una relazione tra due collaboratori, come nei modelli di social network descritti da Boyd e Ellison.

Lo studio della topologia di questa rete può rivelare informazioni sulle relazioni tra i collaboratori, senza dover conoscere in dettaglio le modifiche effettuate dagli utenti e studiare eventuali interazioni pregresse tra i soggetti, come ad es. modifiche effettuate dagli utenti nelle stesse voci, discussioni svolte nelle pagine comunitarie, interessi comuni dichiarati dagli utenti nei relativi profili.

Scopo di questo lavoro pilota è verificare con l’ausilio dei dati disponibili le doti di robustezza e resilienza e la presenza di hub della rete dei ringraziamenti ed eventualmente di esprimere delle ipotesi su un eventuale scostamento.
0 0
A Platform for Visually Exploring the Development of Wikipedia Articles Erik Borra
David Laniado
Esther Weltevrede
Michele Mauri
Giovanni Magni
Tommaso Venturini
Paolo Ciuccarelli
Richard Rogers
Andreas Kaltenbrunner
ICWSM '15 - 9th International AAAI Conference on Web and Social Media English May 2015 When looking for information on Wikipedia, Internet users generally just read the latest version of an article. However, in its back-end there is much more: associated to each article are the edit history and talk pages, which together entail its full evolution. These spaces can typically reach thousands of contributions, and it is not trivial to make sense of them by manual inspection. This issue also affects Wikipedians, especially the less experienced ones, and constitutes a barrier for new editor engagement and retention. To address these limitations, Contropedia offers its users unprecedented access to the development of an article, using wiki links as focal points. 0 0
Societal Controversies in Wikipedia Articles Erik Borra
Esther Weltevrede
Paolo Ciuccarelli
Andreas Kaltenbrunner
David Laniado
Giovanni Magni
Michele Mauri
Richard Rogers
Wikipedia
Controversy Mapping
Social Science
Data Visualization
CHI '15 - Proceedings of the 33rd annual ACM conference on Human factors in computing systems English April 2015 Collaborative content creation inevitably reaches situations where different points of view lead to conflict. We focus on Wikipedia, the free encyclopedia anyone may edit, where disputes about content in controversial articles often reflect larger societal debates. While Wikipedia has a public edit history and discussion section for every article, the substance of these sections is difficult to phantom for Wikipedia users interested in the development of an article and in locating which topics were most controversial. In this paper we present Contropedia, a tool that augments Wikipedia articles and gives insight into the development of controversial topics. Contropedia uses an efficient language agnostic measure based on the edit history that focuses on wiki links to easily identify which topics within a Wikipedia article have been most controversial and when. 0 0
Organización del conocimiento en entornos wiki: una experiencia de organización de información sobre lecturas académicas Jesús Tramullas
Ana I. Sánchez
Piedad Garrido-Picazo
Wiki
Higher education
Lectures
User generated content
Organización del conocimiento: sistemas de información abiertos. Actas del XII Congreso ISKO España y II Congreso ISKO España y Portugal Spanish 2015 This paper reviews the informational behavior of a community of university students during the development of a learning activity with a wiki. Through a case study, analyzes the data available on the wiki, and identifies patterns of creating and organizing content. The wiki study is also done within the information management framework proposed by Rowley. The findings support the conclusion that students apply the principle of economy of effort in their informational behavior, guided by the assessment requirements of that activity, and Rowley's proposal is not suitable for analyzing and evaluating educational processes technologically mediated. 0 0
Situated Interaction in a Multilingual Spoken Information Access Framework Niklas Laxström
Kristiina Jokinen
Graham Wilcock
WikiTalk IWSDS 2014 English 18 January 2014 0 0
A cloud-based real-time mobile collaboration wiki system Wang W.H.
Hao Y.M.
Cao Y.H.
Li L.
Cloud computing
Mobile collaboration
Wiki
Applied Mechanics and Materials English 2014 Wiki system is an important application using Wiki technology for knowledge sharing on internet nowadays. Existing Wikipedia system have been developed with distributed collaboration ability, which most of them can't support mobile and real-time collaboration. In this paper, a novel real-time mobile collaboration WiKi system based on cloud was presented. At first, the real-time request of user group in cloud-based mobile collaboration WiKi system was discussed, and group pattern for mobile collaboration Wiki system (GPMCW) was constructed in mobile cloud environment. After that, the multiple layered Web architecture oriented mobile cloud environment was proposed. Then, the instance of cloud-based real-time mobile collaboration WiKi system (RMCWS) was given. To demonstrate the feasibility of system, a prototype system named mobile group collaboration supporting platform (MGCSP) has been constructed based on it. Practice shows that RMCWS is robust and efficient for supporting real-time mobile group collaboration, and has a good ability to idea-sharing and knowledge communication for the people. 0 0
A composite kernel approach for dialog topic tracking with structured domain knowledge from Wikipedia Soo-Hwan Kim
Banchs R.E.
Hua Li
52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference English 2014 Dialog topic tracking aims at analyzing and maintaining topic transitions in ongoing dialogs. This paper proposes a composite kernel approach for dialog topic tracking to utilize various types of domain knowledge obtained from Wikipedia. Two kernels are defined based on history sequences and context trees constructed based on the extracted features. The experimental results show that our composite kernel approach can significantly improve the performances of topic tracking in mixed-initiative human-human dialogs. 0 0
A correlation-based semantic model for text search Sun J.
Bin Wang
Yang X.
Semantic correlation
Text search
Wikipedia
Lecture Notes in Computer Science English 2014 With the exponential growth of texts on the Internet, text search is considered a crucial problem in many fields. Most of the traditional text search approaches are based on "bag of words" text representation based on frequency statics. However, these approaches ignore the semantic correlation of words in the text. So this may lead to inaccurate ranking of the search results. In this paper, we propose a new Wikipedia-based similar text search approach that the words in the texts and query text could be semantic correlated in Wikipedia. We propose a new text representation model and a new text similarity metric. Finally, the experiments on the real dataset demonstrate the high precision, recall and efficiency of our approach. 0 0
A cross-cultural comparison on contributors' motivations to online knowledge sharing: Chinese vs. Germans Zhu B.
Gao Q.
Nohdurft E.
Cross-cultural differences
Knowledge sharing
Wikipedia
Lecture Notes in Computer Science English 2014 Wikipedia is the most popular online knowledge sharing platform in western countries. However, it is not widely accepted in eastern countries. This indicates that culture plays a key role in determining users' acceptance of online knowledge sharing platforms. The purpose of this study is to investigate the cultural differences between Chinese and Germans in motivations for sharing knowledge, and further examine the impacts of these motives on the actual behavior across two cultures. A questionnaire was developed to explore the motivation factors and actual behavior of contributors. 100 valid responses were received from Chinese and 34 responses from the Germans. The results showed that the motivations were significantly different between Chinese and Germans. The Chinese had more consideration for others and cared more about receiving reward and strengthening the relationship, whereas Germans had more concerns about losing competitiveness. The impact of the motives on the actual behavior was also different between Chinese and Germans. 0 0
A latent variable model for discourse-Aware concept and entity disambiguation Angela Fahrni
Michael Strube
14th Conference of the European Chapter of the Association for Computational Linguistics 2014, EACL 2014 English 2014 This paper takes a discourse-oriented perspective for disambiguating common and proper noun mentions with respect to Wikipedia. Our novel approach models the relationship between disambiguation and aspects of cohesion using Markov Logic Networks with latent variables. Considering cohesive aspects consistently improves the disambiguation results on various commonly used data sets. 0 0
A method for refining a taxonomy by using annotated suffix trees and wikipedia resources Chernyak E.
Mirkin B.
String-to-text relevance
Suffix trees
Taxonomy refinement
Utilizing Wikipedia
Procedia Computer Science English 2014 A two-step approach to taxonomy construction is presented. On the first step the frame of taxonomy is built manually according to some representative educational materials. On the second step, the frame is refined using the Wikipedia category tree and articles. Since the structure of Wikipedia is rather noisy, a procedure to clear the Wikipedia category tree is suggested. A string-to-text relevance score, based on annotated suffix trees, is used several times to 1) clear the Wikipedia data from noise; 2) to assign Wikipedia categories to taxonomy topics; 3) to choose whether the category should be assigned to the taxonomy topic or stay on intermediate levels. The resulting taxonomy consists of three parts: the manully set upper levels, the adopted Wikipedia category tree and the Wikipedia articles as leaves. Also, a set of so-called descriptors is assigned to every leaf; these are phrases explaining aspects of the leaf topic. The method is illustrated by its application to two domains: a) Probability theory and mathematical statistics, b) "Numerical analysis" (both in Russian). © 2014 Published by Elsevier B.V. 0 0
A methodology based on commonsense knowledge and ontologies for the automatic classification of legal cases Capuano N.
De Maio C.
Salerno S.
Toti D.
ACM International Conference Proceeding Series English 2014 We describe a methodology for the automatic classification of legal cases expressed in natural language, which relies on existing legal ontologies and a commonsense knowledge base. This methodology is founded on a process consisting of three phases: an enrichment of a given legal ontology by associating its terms with topics retrieved from the Wikipedia knowledge base; an extraction of relevant concepts from a given textual legal case; and a matching between the enriched ontological terms and the extracted concepts. Such a process has been successfully implemented in a corresponding tool that is part of a larger framework for self-litigation and legal support for the Italian law. 0 0
A novel system for the semi automatic annotation of event images McParlane P.J.
Jose J.M.
Photo tag recommendation
Twitter
Wikipedia
SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2014 With the rise in popularity of smart phones, taking and sharing photographs has never been more openly accessible. Further, photo sharing websites, such as Flickr, have made the distribution of photographs easy, resulting in an increase of visual content uploaded online. Due to the laborious nature of annotating images, however, a large percentage of these images are unannotated making their organisation and retrieval difficult. Therefore, there has been a recent research focus on the automatic and semi-automatic process of annotating these images. Despite the progress made in this field, however, annotating images automatically based on their visual appearance often results in unsatisfactory suggestions and as a result these models have not been adopted in photo sharing websites. Many methods have therefore looked to exploit new sources of evidence for annotation purposes, such as image context for example. In this demonstration, we instead explore the scenario of annotating images taken at a large scale events where evidences can be extracted from a wealth of online textual resources. Specifically, we present a novel tag recommendation system for images taken at a popular music festival which allows the user to select relevant tags from related Tweets and Wikipedia content, thus reducing the workload involved in the annotation process. Copyright 2014 ACM. 0 0
A perspective-aware approach to search: Visualizing perspectives in news search results Qureshi M.A.
O'Riordan C.
Pasi G.
News search
Perspective
Wikipedia
SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2014 The result set from a search engine for any user's query may exhibit an inherent perspective due to issues with the search engine or issues with the underlying collection. This demonstration paper presents a system that allows users to specify at query time a perspective together with their query. The system then presents results from well-known search engines with a visualization of the results which allows the users to quickly surmise the presence of the perspective in the returned set. 0 0
A piece of my mind: A sentiment analysis approach for online dispute detection Lei Wang
Cardie C.
52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference English 2014 We investigate the novel task of online dispute detection and propose a sentiment analysis solution to the problem: we aim to identify the sequence of sentence-level sentiments expressed during a discussion and to use them as features in a classifier that predicts the DISPUTE/NON-DISPUTE label for the discussion as a whole. We evaluate dispute detection approaches on a newly created corpus of Wikipedia Talk page disputes and find that classifiers that rely on our sentiment tagging features outperform those that do not. The best model achieves a very promising F1 score of 0.78 and an accuracy of 0.80. 0 0
A scalable gibbs sampler for probabilistic entity linking Houlsby N.
Massimiliano Ciaramita
Lecture Notes in Computer Science English 2014 Entity linking involves labeling phrases in text with their referent entities, such as Wikipedia or Freebase entries. This task is challenging due to the large number of possible entities, in the millions, and heavy-tailed mention ambiguity. We formulate the problem in terms of probabilistic inference within a topic model, where each topic is associated with a Wikipedia article. To deal with the large number of topics we propose a novel efficient Gibbs sampling scheme which can also incorporate side information, such as the Wikipedia graph. This conceptually simple probabilistic approach achieves state-of-the-art performance in entity-linking on the Aida-CoNLL dataset. 0 0
A seed based method for dictionary translation Krajewski R.
Rybinski H.
Kozlowski M.
Dictionary translation
Machine translation
Multilingual corpus
Semantic similarity
Lecture Notes in Computer Science English 2014 The paper refers to the topic of automatic machine translation. The proposed method enables translating a dictionary by means of mining repositories in the source and target repository, without any directly given relationships connecting two languages. It consists of two stages: (1) translation by lexical similarity, where words are compared graphically, and (2) translation by semantic similarity, where contexts are compared. Polish and English version of Wikipedia were used as multilingual corpora. The method and its stages are thoroughly analyzed. The results allow implementing this method in human-in-the-middle systems. 0 0
A student perception related to the implementation of virtual courses Chilian A.
Bancuta O.-R.
Bancuta C.
Access to information
Cooperation
Internet
Moodle
Virtual course
Wiki
Lecture Notes in Computer Science English 2014 This paper aims to characterize the point of view of the students regarding virtual courses in education, but in particular the study is based on the experience gained by the students in the Designing TEL course, organized in the frame of CoCreat project. Thus, it was noticed that a very important role in the development of virtual courses was played by using Wiki and Moodle platforms. Even there are still some problems on implementing virtual courses using those platforms, Designing TEL course can be considered a successful one. 0 0
A study of the wikipedia knowledge recommendation service for satisfaction of ePortfolio Users Huang C.-H.
Huang Y.-Q.
Yang J.-H.
Wang W.-Y.
EPortfolio
Satisfaction
Wikipedia Knowledge
Lecture Notes in Electrical Engineering English 2014 This study extended a conventional ePortfolio by proposed Wikipedia knowledge recommendation service (WKRS). Participants included 100 students taking courses at National Central University which were divided into experimental group and control group. The control group students and experimental group students have created their learning portfilios by using ePortfolio with WKRS and conventional ePortfolio without WKRS, respectively. The data for this study was collected over 3 months. The experimental results have shown that the learners' satisfaction, system use, system quality and information/knowledge quality of experimental group students have significant progress than control group students. 0 0
A user centred approach to represent expert knowledge: A case study at STMicroelectronics Brichni M.
Gzara L.
Dupuy-Chessa S.
Jeannet C.
Capitalization
Expert knowledge
Representation
Wiki
Proceedings - International Conference on Research Challenges in Information Science English 2014 The rapid growth of companies, the departure of employees, the complexity of the new technology and the rapid proliferation of information, are reasons why companies seek to capitalize their expert knowledge. STMicroelectronics has opted for a Wiki to effectively capitalize and share some of its knowledge. However, to accomplish its objective, the Wiki content must correspond to users' needs. Therefore, we propose a user centred approach for the definition of knowledge characteristics and its integration in theWiki. Our knowledge representation is based on three facets "What, Why and How". In this paper, the approach is applied to the Reporting activity at STMicroelectronics, which is considered as a knowledge intensive activity. 0 0
Acquisition des traductions de requêtes à partir de wikipédia pour la recherche d'information translingue Chakour H.
Sadat F.
Vision 2020: Sustainable Growth, Economic Development, and Global Competitiveness - Proceedings of the 23rd International Business Information Management Association Conference, IBIMA 2014 French 2014 The multilingual encyclopedia Wikipedia has become a very useful resource for the construction and enrichment of linguistic resources, such as dictionaries and ontologies. In this study, we are interested by the exploitation of Wikipedia for query translation in Cross-Language Information Retrieval. An application is completed for the Arabic-English pair of languages. All possible translation candidates are extracted from the titles of Wikipedia articles based on the inter-links between Arabic and English; which is considered as direct translation. Furthermore, other links such as Arabic to French and French to English are exploited for a transitive translation. A slight stemming and segmentation of the query into multiple tokens can be made if no translation can be found for the entire query. Assessments monolingual and cross-lingual systems were conducted using three weighting schemes of the Lucene search engine (default, Tf-Idf and BM25). In addition, the performance of the so-called translation method was compared with those of GoogleTranslate and MyMemory. 0 0
An automatic sameAs link discovery from Wikipedia Kagawa K.
Susumu Tamagawa
Takahira Yamaguchi
Disambiguation
Ontology
SameAs link
Spelling variants
Synonym
Wikipedia
Lecture Notes in Computer Science English 2014 Spelling variants of words or word sense ambiguity takes many costs in such processes as Data Integration, Information Searching, data pre-processing for Data Mining, and so on. It is useful to construct relations between a word or phrases and a representative name of the entity to meet these demands. To reduce the costs, this paper discusses how to automatically discover "sameAs" and "meaningOf" links from Japanese Wikipedia. In order to do so, we gathered relevant features such as IDF, string similarity, number of hypernym, and so on. We have identified the link-based score on salient features based on SVM results with 960,000 anchor link pairs. These case studies show us that our link discovery method goes well with more than 70% precision/ recall rate. 0 0
An information retrieval expansion model based on Wikipedia Gan L.X.
Tu W.
Information retrieval
Query expansion
Wikipedia
Advanced Materials Research English 2014 Query expansion is one of the key technologies for improving precision and recall in information retrieval. In order to overcome limitations of single corpus, in this paper, semantic characteristics of Wikipedia corpus is combined with the standard corpus to extract more rich relationship between terms for construction of a steady Markov semantic network. Information of the entity pages and disambiguation pages in Wikipedia is comprehensively utilized to classify query terms to improve query classification accuracy. Related candidates with high quality can be used for query expansion according to semantic pruning. The proposal in our work is benefit to improve retrieval performance and to save search computational cost. 0 0
Analysing the duration of trending topics in twitter using wikipedia Thanh Tran
Georgescu M.
Zhu X.
Kanhabua N.
Temporal analysis
Time series
Twitter
Wikipedia
WebSci 2014 - Proceedings of the 2014 ACM Web Science Conference English 2014 The analysis of trending topics in Twitter is a goldmine for a variety of studies and applications. However, the contents of topics vary greatly from daily routines to major public events, enduring from a few hours to weeks or months. It is thus helpful to distinguish trending topics related to real- world events with those originated within virtual communi- ties. In this paper, we analyse trending topics in Twitter using Wikipedia as reference for studying the provenance of trending topics. We show that among difierent factors, the duration of a trending topic characterizes exogenous Twitter trending topics better than endogenous ones. Copyright 0 0
Apply wiki for improving intellectual capital and effectiveness of project management at Cideco company Misra S.
Pham Q.T.
Tran T.N.
Cideco Company
Intellectual capital
Knowledge management
Project management
Wiki
Lecture Notes in Computer Science English 2014 Today, knowledge is considered the only source for creating the competitive advantages of modern organizations. However, managing intellectual capital is challenged, especially for SMEs in developing countries like Vietnam. In order to help SMEs to build KMS and to stimulate their intellectual capital, a suitable technical platform for collaboration is needed. Wiki is a cheap technology for improving both intellectual capital and effectiveness of project management. However, there is a lack of proof about real benefit of applying wiki in Vietnamese SMEs. Cideco Company, a Vietnamese SME in construction design & consulting industry, is finding a solution to manage its intellectual capital for improving the effectiveness of project management. In this research, wiki is applied and tested to check whether it can be a suitable technology for Cideco to stimulate its intellectual capital and to improve the effectiveness of project management activities. Besides, a demo wiki is also implemented for 2 pilot projects to evaluate its real benefit. Analysis results showed that wiki can help to increase both intellectual capital and effectiveness of project management at Cideco. 0 0
Architecture description leveraging model driven engineering and semantic wikis Baroni A.
Muccini H.
Malavolta I.
Woods E.
Architecture Description
Model-Driven Engineering
Wiki
Proceedings - Working IEEE/IFIP Conference on Software Architecture 2014, WICSA 2014 English 2014 A previous study, run by some of the authors in collaboration with practitioners, has emphasized the need to improve architectural languages in order to (i) make them simple and intuitive enough to communicate effectively with project stakeholders, and (ii) enable formality and rigour to allow analysis and other automated tasks. Although a multitude of languages have been created by researchers and practitioners, they rarely address both of these needs. In order to reconcile these divergent needs, this paper presents an approach that (i) combines the rigorous foundations of model-driven engineering with the usability of semantic wikis, and (ii) enables continuous syncronization between them, this allows software architects to simultaneously use wiki pages for communication and models for model-based analysis and manipulation. In this paper we explain how we applied the approach to an industry-inspired case study using the Semantic Media Wiki wiki engine and a model-driven architecture description implemented within the Eclipse Modeling Framework. We also discuss how our approach can be generalized to other wiki-based and model-driven technologies. 0 0
Assessing the quality of Thai Wikipedia articles using concept and statistical features Saengthongpattana K.
Soonthornphisaj N.
Concept feature
Decision tree
Naïve Bayes
Quality of Thai Wikipedia articles
Statistical feature
Advances in Intelligent Systems and Computing English 2014 The quality evaluation of Thai Wikipedia articles relies on user consideration. There are increasing numbers of articles every day therefore the automatic evaluation method is needed for user. Components of Wikipedia articles such as headers, pictures, references, and links are useful to indicate the quality of articles. However readers need complete content to cover all of concepts in that article. The concept features are investigated in this work. The aim of this research is to classify Thai Wikipedia articles into two classes namely high-quality and low-quality class. Three article domains (Biography, Animal, and Place) are testes with decision tree and Naïve Bayes. We found that Naïve Bayes gets high TP Rate compared to decision tree in every domain. Moreover, we found that the concept feature plays an important role in quality classification of Thai Wikipedia articles. 0 0
Augmenting concept definition in gloss vector semantic relatedness measure using wikipedia articles Pesaranghader A.
Rezaei A.
Bioinformatics
Biomedical Text Mining
MEDLINE
Natural Language Processing
Semantic relatedness
UMLS
Web mining
Wikipedia
Lecture Notes in Electrical Engineering English 2014 Semantic relatedness measures are widely used in text mining and information retrieval applications. Considering these automated measures, in this research paper we attempt to improve Gloss Vector relatedness measure for more accurate estimation of relatedness between two given concepts. Generally, this measure, by constructing concepts definitions (Glosses) from a thesaurus, tries to find the angle between the concepts' gloss vectors for the calculation of relatedness. Nonetheless, this definition construction task is challenging as thesauruses do not provide full coverage of expressive definitions for the particularly specialized concepts. By employing Wikipedia articles and other external resources, we aim at augmenting these concepts' definitions. Applying both definition types to the biomedical domain, using MEDLINE as corpus, UMLS as the default thesaurus, and a reference standard of 68 concept pairs manually rated for relatedness, we show exploiting available resources on the Web would have positive impact on final measurement of semantic relatedness. 0 0
Automatic theory generation from analyst text files using coherence networks Shaffer S.C. Coherence
Natural language
SYNCOIN
Text analysis
Proceedings of SPIE - The International Society for Optical Engineering English 2014 This paper describes a three-phase process of extracting knowledge from analyst textual reports. Phase 1 involves performing natural language processing on the source text to extract subject-predicate-object triples. In phase 2, these triples are then fed into a coherence network analysis process, using a genetic algorithm optimization. Finally, the highest-value sub networks are processed into a semantic network graph for display. Initial work on a well- known data set (a Wikipedia article on Abraham Lincoln) has shown excellent results without any specific tuning. Next, we ran the process on the SYNthetic Counter-INsurgency (SYNCOIN) data set, developed at Penn State, yielding interesting and potentially useful results. 0 0
Automatically detecting corresponding edit-turn-pairs in Wikipedia Daxenberger J.
Iryna Gurevych
52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference English 2014 In this study, we analyze links between edits in Wikipedia articles and turns from their discussion page. Our motivation is to better understand implicit details about the writing process and knowledge flow in collaboratively created resources. Based on properties of the involved edit and turn, we have defined constraints for corresponding edit-turn-pairs. We manually annotated a corpus of 636 corresponding and non-corresponding edit-turn-pairs. Furthermore, we show how our data can be used to automatically identify corresponding edit-turn-pairs. With the help of supervised machine learning, we achieve an accuracy of 87 for this task. 0 0
Bootstrapping Wikipedia to answer ambiguous person name queries Gruetze T.
Gjergji Kasneci
Zuo Z.
Naumann F.
Proceedings - International Conference on Data Engineering English 2014 Some of the main ranking features of today's search engines reflect result popularity and are based on ranking models, such as PageRank, implicit feedback aggregation, and more. While such features yield satisfactory results for a wide range of queries, they aggravate the problem of search for ambiguous entities: Searching for a person yields satisfactory results only if the person in question is represented by a high-ranked Web page and all required information are contained in this page. Otherwise, the user has to either reformulate/refine the query or manually inspect low-ranked results to find the person in question. A possible approach to solve this problem is to cluster the results, so that each cluster represents one of the persons occurring in the answer set. However clustering search results has proven to be a difficult endeavor by itself, where the clusters are typically of moderate quality. A wealth of useful information about persons occurs in Web 2.0 platforms, such as Wikipedia, LinkedIn, Facebook, etc. Being human-generated, the information on these platforms is clean, focused, and already disambiguated. We show that when searching with ambiguous person names the information from Wikipedia can be bootstrapped to group the results according to the individuals occurring in them. We have evaluated our methods on a hand-labeled dataset of around 5,000 Web pages retrieved from Google queries on 50 ambiguous person names. 0 0
Bridging temporal context gaps using time-aware re-contextualization Ceroni A.
Tran N.K.
Kanhabua N.
Niederee C.
Complementarity
Temporal context
Time-aware re-contextualization
Wikipedia
SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2014 Understanding a text, which was written some time ago, can be compared to translating a text from another language. Complete interpretation requires a mapping, in this case, a kind of time-travel translation between present context knowledge and context knowledge at time of text creation. In this paper, we study time-aware re-contextualization, the challenging problem of retrieving concise and complementing information in order to bridge this temporal context gap. We propose an approach based on learning to rank techniques using sentence-level context information extracted from Wikipedia. The employed ranking combines relevance, complementarity and time-awareness. The effectiveness of the approach is evaluated by contextualizing articles from a news archive collection using more than 7,000 manually judged relevance pairs. To this end, we show that our approach is able to retrieve a significant number of relevant context information for a given news article. Copyright 2014 ACM. 0 0
Building distant supervised relation extractors Nunes T.
Schwabe D.
DBpedia
Distant Supervision
Information extraction
Relation Extraction
Wikipedia
Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014 English 2014 A well-known drawback in building machine learning semantic relation detectors for natural language is the lack of a large number of qualified training instances for the target relations in multiple languages. Even when good results are achieved, the datasets used by the state-of-the-art approaches are rarely published. In order to address these problems, this work presents an automatic approach to build multilingual semantic relation detectors through distant supervision combining two of the largest resources of structured and unstructured content available on the Web, DBpedia and Wikipedia. We map the DBpedia ontology back to the Wikipedia text to extract more than 100.000 training instances for more than 90 DBpedia relations for English and Portuguese languages without human intervention. First, we mine the Wikipedia articles to find candidate instances for relations described in the DBpedia ontology. Second, we preprocess and normalize the data filtering out irrelevant instances. Finally, we use the normalized data to construct regularized logistic regression detectors that achieve more than 80% of F-Measure for both English and Portuguese languages. In this paper, we also compare the impact of different types of features on the accuracy of the trained detector, demonstrating significant performance improvements when combining lexical, syntactic and semantic features. Both the datasets and the code used in this research are available online. 0 0
Building sentiment lexicons for all major languages Yirong Chen
Skiena S.
52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference English 2014 Sentiment analysis in a multilingual world remains a challenging problem, because developing language-specific sentiment lexicons is an extremely resourceintensive process. Such lexicons remain a scarce resource for most languages. In this paper, we address this lexicon gap by building high-quality sentiment lexicons for 136 major languages. We integrate a variety of linguistic resources to produce an immense knowledge graph. By appropriately propagating from seed words, we construct sentiment lexicons for each component language of our graph. Our lexicons have a polarity agreement of 95.7% with published lexicons, while achieving an overall coverage of 45.2%. We demonstrate the performance of our lexicons in an extrinsic analysis of 2,000 distinct historical figures' Wikipedia articles on 30 languages. Despite cultural difference and the intended neutrality of Wikipedia articles, our lexicons show an average sentiment correlation of 0.28 across all language pairs. 0 0
Chinese and Korean cross-lingual issue news detection based on translation knowledge of Wikipedia Zhao S.
Tsolmon B.
Lee K.-S.
Lee Y.-S.
Cross-Lingual link discovery
Issue news detection
Wikipedia knowledge
Lecture Notes in Electrical Engineering English 2014 Cross-lingual issue news and analyzing the news content is an important and challenging task. The core of the cross-lingual research is the process of translation. In this paper, we focus on extracting cross-lingual issue news from the Twitter data of Chinese and Korean. We propose translation knowledge method for Wikipedia concepts as well as the Chinese and Korean cross-lingual inter-Wikipedia link relations. The relevance relations are extracted from the category and the page title of Wikipedia. The evaluation achieved a performance of 83% in average precision in the top 10 extracted issue news. The result indicates that our method is an effective for cross-lingual issue news detection. 0 0
Classification and indexing of complex digital objects with CIDOC CRM Enge J.
Lurk T.
Archiving 2014 - Final Program and Proceedings English 2014 CIDOC-CRM provides an ontology-based description for the documentation of cultural heritage. Originally meant to support the documentation practice of cultural heritage institutions and to enable inter-institutional exchange, it defines a formal structure for the description of implicit and explicit relations between entities. In order to demonstrate the benefits of the model in a semantic web environment like "Semantic MediaWiki", the paper shows two practical examples. Both originate in the digital domain and are complex due to their nature: As an example of a completely synthetically generated HD-Video, "Sintel" (2010) by Colin Levy is gathered. Facing distributed internet-based art and culture, Olia Lialinas "Summer" (2013) is described. The examples demonstrate in what extent the semantic structure of the digital extension of CIDOC CRM, which is CRMdig, clarifies the objects nature (understanding) and thus supports the planning and documentation process of dedicated collections. For doing so, an own system, called CRM-Wiki was implemented. 0 0
Collaborative tools in the primary classroom: Teachers' thoughts on wikis Agesilaou A.
Vassiliou C.
Irakleous S.
Zenios M.
Collaboration
Collaborative learning
Educators
Primary education
Wiki
Lecture Notes in Computer Science English 2014 The purpose of this work-in-progress study is to examine the attitudes of primary school teachers in Cyprus on the use of wikis as a mean to promote collaborative learning in the classroom. A survey investigation was undertaken using 20 questionnaires and 3 semi-structured interviews. The survey results indicate a positive attitude of teachers in Cyprus to integrate wikis in primary education for the promotion of cooperation. As such collaborative learning activities among pupils are being encouraged. 0 0
Collective memory in Poland: A reflection in street names Radoslaw Nielek
Wawer A.
Adam Wierzbicki
Collective memory
Street names
Wikipedia
Lecture Notes in Computer Science English 2014 Our article starts with an observation that street names fall into two general types: generic and historically inspired. We analyse street names distributions (of the second type) as a window to nation-level collective memory in Poland. The process of selecting street names is determined socially, as the selections reflect the symbols considered important to the nation-level society, but has strong historical motivations and determinants. In the article, we seek for these relationships in the available data sources. We use Wikipedia articles to match street names with their textual descriptions and assign them to the time points. We then apply selected text mining and statistical techniques to reach quantitative conclusions. We also present a case study: the geographical distribution of two particular street names in Poland to demonstrate the binding between history and political orientation of regions. 0 0
Comparing the pulses of categorical hot events in Twitter and Weibo Shuai X.
Xiaojiang Liu
Xia T.
Wu Y.
Guo C.
Click log mining
Community comparison
Information diffusion
Information retrieval
Social media
Twitter
Weibo
Wikipedia
HT 2014 - Proceedings of the 25th ACM Conference on Hypertext and Social Media English 2014 The fragility and interconnectivity of the planet argue compellingly for a greater understanding of how different communities make sense of their world. One of such critical demands relies on comparing the Chinese and the rest of the world (e.g., Americans), where communities' ideological and cultural backgrounds can be significantly different. While traditional studies aim to learn the similarities and differences between these communities via high-cost user studies, in this paper we propose a much more efficient method to compare different communities by utilizing social media. Specifically, Weibo and Twitter, the two largest microblogging systems, are employed to represent the target communities, i.e. China and the Western world (mainly United States), respectively. Meanwhile, through the analysis of the Wikipedia page-click log, we identify a set of categorical 'hot events' for one month in 2012 and search those hot events in Weibo and Twitter corpora along with timestamps via information retrieval methods. We further quantitatively and qualitatively compare users' responses to those events in Twitter and Weibo in terms of three aspects: popularity, temporal dynamic, and information diffusion. The comparative results show that although the popularity ranking of those events are very similar, the patterns of temporal dynamics and information diffusion can be quite different. 0 0
Computer-supported collaborative accounts of major depression: Digital rhetoric on Quora and Wikipedia Rughinis C.
Huma B.
Matei S.
Rughinis R.
Computer supported collaborative knowledge making
Digital rhetoric
Major depression
Quora
Wikipedia
Iberian Conference on Information Systems and Technologies, CISTI English 2014 We analyze digital rhetoric in two computer-supported collaborative settings of writing and learning, focusing on major depression: Wikipedia and Quora. We examine the procedural rhetoric of access to and interaction with information, and the textual rhetoric of individual and aggregated entries. Through their different organization of authorship, publication and reading, the two settings create divergent accounts of depression. Key points of difference include: focus on symptoms and causes vs. experiences and advice, use of lists vs. metaphors and narratives, a/temporal structure, and personal and relational knowledge. 0 0
Conceptual clustering Boubacar A.
Niu Z.
Clustering
Concept analysis
Data mining
Information retrieval
Lecture Notes in Electrical Engineering English 2014 Traditional clustering methods are unable to describe the generated clusters. Conceptual clustering is an important and active research area that aims to efficiently cluster and explain the data. Previous conceptual clustering approaches provide descriptions that do not use a human comprehensible knowledge. This paper presents an algorithm which uses Wikipedia concepts to process a clustering method. The generated clusters overlap each other and serve as a basis for an information retrieval system. The method has been implemented in order to improve the performance of the system. It reduces the computation cost. 0 0
Continuous temporal Top-K query over versioned documents Lan C.
YanChun Zhang
Chunxiao Xing
Chenliang Li
Lecture Notes in Computer Science English 2014 The management of versioned documents has attracted researchers' attentions in recent years. Based on the observation that decision-makers are often interested in finding the set of objects that have continuous behavior over time, we study the problem of continuous temporal top-k query. With a given a query, continuous temporal top-k search finds the documents that frequently rank in the top-k during a time period and take the weights of different time intervals into account. Existing works regarding querying versioned documents have focused on adding the constraint of time, however lacked to consider the continuous ranking of objects and weights of time intervals. We propose a new interval window-based method to address this problem. Our method can get the continuous temporal top-k results while using interval windows to support time and weight constraints simultaneously. We use data from Wikipedia to evaluate our method. 0 0
Creating a phrase similarity graph from wikipedia Stanchev L. Semantic search
Similarity graph
Proceedings - 2014 IEEE International Conference on Semantic Computing, ICSC 2014 English 2014 The paper addresses the problem of modeling the relationship between phrases in English using a similarity graph. The mathematical model stores data about the strength of the relationship between phrases expressed as a decimal number. Both structured data from Wikipedia, such as that the Wikipedia page with title 'Dog' belongs to the Wikipedia category 'Domesticated animals', and textual descriptions, such as that the Wikipedia page with title 'Dog' contains the word 'wolf' thirty one times are used in creating the graph. The quality of the graph data is validated by comparing the similarity of pairs of phrases using our software that uses the graph with results of studies that were performed with human subjects. To the best of our knowledge, our software produces better correlation with the results of both the Miller and Charles study and the WordSimilarity-353 study than any other published research. 0 0
Cross-language and cross-encyclopedia article linking using mixed-language topic model and hypernym translation Wang Y.-C.
Wu C.-K.
Tsai R.T.-H.
52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference English 2014 Creating cross-language article links among different online encyclopedias is now an important task in the unification of multilingual knowledge bases. In this paper, we propose a cross-language article linking method using a mixed-language topic model and hypernym translation features based on an SVM model to link English Wikipedia and Chinese Baidu Baike, the most widely used Wiki-like encyclopedia in China. To evaluate our approach, we compile a data set from the top 500 Baidu Baike articles and their corresponding English Wiki articles. The evaluation results show that our approach achieves 80.95% in MRR and 87.46% in recall. Our method does not heavily depend on linguistic characteristics and can be easily extended to generate crosslanguage article links among different online encyclopedias in other languages. 0 0
Crowd-based appraisal and description of archival records at the State Archives Baden-Württemberg Naumann K.
Ziwes F.-J.
Archiving 2014 - Final Program and Proceedings English 2014 Appraisal and description are core processes at historical archives. This article gives an account of innovative methodologies in this field using crowd-sourced information to (1st) identify which files are of interest for the public, (2nd) enable agency staff to extract and transfer exactly those files selected for permanent retention and (3rd) ease the description and cataloguing of the transferred objects. It defines the extent of outsourcing used at the State Archives (Landesarchiv Baden-Württemberg LABW), describes case studies and touches issues of change management. Data sources are government databases and geodatabases, commercial data on court decisions, the name tags of German Wikipedia, and bio-bibliographical metadata of the State Libraries and the German National Library. 0 0
Designing information savvy societies: An introduction to assessability Andrea Forte
Andalibi N.
Park T.
Willever-Farr H.
Assessability
Credibility
Information literacy
Wikipedia
Conference on Human Factors in Computing Systems - Proceedings English 2014 This paper provides first steps toward an empirically grounded design vocabulary for assessable design as an HCI response to the global need for better information literacy skills. We present a framework for synthesizing literatures called the Interdisciplinary Literacy Framework and use it to highlight gaps in our understanding of information literacy that HCI as a field is particularly well suited to fill. We report on two studies that lay a foundation for developing guidelines for assessable information system design. The first is a study of Wikipedians', librarians', and laypersons' information assessment practices from which we derive two important features of assessable designs: Information provenance and stewardship. The second is an experimental study in which we operationalize these concepts in designs and test them using Amazon Mechanical Turk (MTurk). 0 0
Developing creativity competency of engineers Waychal P.K. Active learning
Creativity
Index of learning styles (ils)
Wikipedia
ASEE Annual Conference and Exposition, Conference Proceedings English 2014 The complete agreement of all stakeholders on the importance of developing the creativity competency of engineering graduates motivated us to undertake this study. We chose a senior-level course in Software Testing and Quality Assurance which offered an excellent platform for the experiment as both testing and quality assurance activities can be executed using either routine or mechanical methods or highly creative ones. The earlier attempts reported in literature to develop the creativity competency do not appear to be systematic i.e. they do not follow the measurement ->action plan ->measurement cycle. The measurements, wherever done, are based on the Torrance Test of Critical Thinking (TTCT) and the Myers Briggs Type Indicator (MBTI). We found these tests costly and decided to search for an appropriate alternative that led us to the Felder Solomon Index of Learning Style (ILS). The Sensing / Intuition dimension of the ILS, like MBTI, is originated in Carl Jung's Theory of Psychological Types. Since a number of MBTI studies have used the dimension for assessing creativity, we posited that the same ILS dimension could be used to measure the competency. We carried out pre-ILS assessment, designed and delivered the course with a variety of activities that could potentially enhance creativity, and carried out course-end post-ILS assessment. Although major changes would not normally be expected after a one-semester course, a hypothesis in the study was that a shift from sensing toward intuition on learning style profiles would be observed, and indeed it was. A paired t- Test indicated that the pre-post change in the average sensing/intuition preference score was statistically significant (p = 0.004). While more research and direct assessment of competency is needed to be able to draw definitive conclusions about both the use of the instrument for measuring creativity and the efficacy of the course structure and contents in developing the competency, the results suggest that the approach is worth exploring. 0 0
Development of a semantic and syntactic model of natural language by means of non-negative matrix and tensor factorization Anisimov A.
Marchenko O.
Taranukha V.
Vozniuk T.
Information extraction
Knowledge representation
Ontology
Wikipedia
Wordnet
Lecture Notes in Computer Science English 2014 A method for developing a structural model of natural language syntax and semantics is proposed. Syntactic and semantic relations between parts of a sentence are presented in the form of a recursive structure called a control space. Numerical characteristics of these data are stored in multidimensional arrays. After factorization, the arrays serve as the basis for the development of procedures for analyses of natural language semantics and syntax. 0 0
Editing beyond articles: Diversity & dynamics of teamwork in open collaborations Morgan J.T.
Gilbert M.
David W. McDonald
Mark Zachry
Group dynamics
Group work
Open collaboration
Wikipedia
English 2014 We report a study of Wikipedia in which we use a mixedmethods approach to understand how participation in specialized workgroups called WikiProjects has changed over the life of the encyclopedia. While previous work has analyzed the work of WikiProjects in supporting the development of articles within particular subject domains, the collaborative role of WikiProjects that do not fit this conventional mold has not been empirically examined. We combine content analysis, interviews and analysis of edit logs to identify and characterize these alternative WikiProjects and the work they do. Our findings suggest that WikiProject participation reflects community concerns and shifts in the community's conception of valued work over the past six years. We discuss implications for other open collaborations that need flexible, adaptable coordination mechanisms to support a range of content creation, curation and community maintenance tasks. Copyright 0 0
Elite size and resilience impact on global system structuration in social media Matei S.A.
Tan W.
Mingjie Zhu
Che-Hung Liu
Bertino E.
Foote J.
Division of labor
Elite
Entropy
Equality
Role
Wikipedia
2014 International Conference on Collaboration Technologies and Systems, CTS 2014 English 2014 The paper examines the role played by the most productive members of social media systems on leading the project and influencing the degree of project structuration. The paper focuses on findings of a large computational social science project that examines Wikipedia.1 0 0
Encoding document semantic into binary codes space Yu Z.
Xuan Zhao
Lei Wang
Lecture Notes in Computer Science English 2014 We develop a deep neural network model to encode document semantic into compact binary codes with the elegant property that semantically similar documents have similar embedding codes. The deep learning model is constructed with three stacked auto-encoders. The input of the lowest auto-encoder is the representation of word-count vector of a document, while the learned hidden features of the deepest auto-encoder are thresholded to be binary codes to represent the document semantic. Retrieving similar document is very efficient by simply returning the documents whose codes have small Hamming distances to that of the query document. We illustrate the effectiveness of our model on two public real datasets - 20NewsGroup and Wikipedia, and the experiments demonstrate that the compact binary codes sufficiently embed the semantic of documents and bring improvement in retrieval accuracy. 0 0
Entity recognition in information extraction Hanafiah N.
Quix C.
Lecture Notes in Computer Science English 2014 Detecting and resolving entities is an important step in information retrieval applications. Humans are able to recognize entities by context, but information extraction systems (IES) need to apply sophisticated algorithms to recognize an entity. The development and implementation of an entity recognition algorithm is described in this paper. The implemented system is integrated with an IES that derives triples from unstructured text. By doing so, the triples are more valuable in query answering because they refer to identified entities. By extracting the information from Wikipedia encyclopedia, a dictionary of entities and their contexts is built. The entity recognition computes a score for context similarity which is based on cosine similarity with a tf-idf weighting scheme and the string similarity. The implemented system shows a good accuracy on Wikipedia articles, is domain independent, and recognizes entities of arbitrary types. 0 0
Evaluating the helpfulness of linked entities to readers Yamada I.
Ito T.
Usami S.
Takagi S.
Hideaki Takeda
Takefuji Y.
Entity linking
Knowledge base
Wikipedia
HT 2014 - Proceedings of the 25th ACM Conference on Hypertext and Social Media English 2014 When we encounter an interesting entity (e.g., a person's name or a geographic location) while reading text, we typically search and retrieve relevant information about it. Entity linking (EL) is the task of linking entities in a text to the corresponding entries in a knowledge base, such as Wikipedia. Recently, EL has received considerable attention. EL can be used to enhance a user's text reading experience by streamlining the process of retrieving information on entities. Several EL methods have been proposed, though they tend to extract all of the entities in a document including unnecessary ones for users. Excessive linking of entities can be distracting and degrade the user experience. In this paper, we propose a new method for evaluating the helpfulness of linking entities to users. We address this task using supervised machine-learning with a broad set of features. Experimental results show that our method significantly outperforms baseline methods by approximately 5.7%-12% F1. In addition, we propose an application, Linkify, which enables developers to integrate EL easily into their web sites. 0 0
Experimental comparison of semantic word clouds Barth L.
Kobourov S.G.
Pupyrev S.
Lecture Notes in Computer Science English 2014 We study the problem of computing semantics-preserving word clouds in which semantically related words are close to each other. We implement three earlier algorithms for creating word clouds and three new ones. We define several metrics for quantitative evaluation of the resulting layouts. Then the algorithms are compared according to these metrics, using two data sets of documents from Wikipedia and research papers. We show that two of our new algorithms outperform all the others by placing many more pairs of related words so that their bounding boxes are adjacent. Moreover, this improvement is not achieved at the expense of significantly worsened measurements for the other metrics. 0 0
Exploiting Twitter and Wikipedia for the annotation of event images McParlane P.J.
Jose J.M.
Tag recommendation
Twitter
Wikipedia
SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2014 With the rise in popularity of smart phones, there has been a recent increase in the number of images taken at large social (e.g. festivals) and world (e.g. natural disasters) events which are uploaded to image sharing websites such as Flickr. As with all online images, they are often poorly annotated, resulting in a difficult retrieval scenario. To overcome this problem, many photo tag recommendation methods have been introduced, however, these methods all rely on historical Flickr data which is often problematic for a number of reasons, including the time lag problem (i.e. in our collection, users upload images on average 50 days after taking them, meaning "training data" is often out of date). In this paper, we develop an image annotation model which exploits textual content from related Twitter and Wikipedia data which aims to overcome the discussed problems. The results of our experiments show and highlight the merits of exploiting social media data for annotating event images, where we are able to achieve recommendation accuracy comparable with a state-of-the-art model. Copyright 2014 ACM. 0 0
Exploiting Wikipedia for Evaluating Semantic Relatedness Mechanisms Ferrara F.
Tasso C.
Communications in Computer and Information Science English 2014 The semantic relatedness between two concepts is a measure that quantifies the extent to which two concepts are semantically related. In the area of digital libraries, several mechanisms based on semantic relatedness methods have been proposed. Visualization interfaces, information extraction mechanisms, and classification approaches are just some examples of mechanisms where semantic relatedness methods can play a significant role and were successfully integrated. Due to the growing interest of researchers in areas like Digital Libraries, Semantic Web, Information Retrieval, and NLP, various approaches have been proposed for automatically computing the semantic relatedness. However, despite the growing number of proposed approaches, there are still significant criticalities in evaluating the results returned by different methods. The limitations evaluation mechanisms prevent an effective evaluation and several works in the literature emphasize that the exploited approaches are rather inconsistent. In order to overcome this limitation, we propose a new evaluation methodology where people provide feedback about the semantic relatedness between concepts explicitly defined in digital encyclopedias. In this paper, we specifically exploit Wikipedia for generating a reliable dataset. 0 0
Exploiting the wisdom of the crowds for characterizing and connecting heterogeneous resources Kawase R.
Siehndel P.
Pereira Nunes B.
Herder E.
Wolfgang Nejdl
Classification
Comparison
Domain independent
Fingerprints
Twikime
Wikipedia
HT 2014 - Proceedings of the 25th ACM Conference on Hypertext and Social Media English 2014 Heterogeneous content is an inherent problem for cross-system search, recommendation and personalization. In this paper we investigate differences in topic coverage and the impact of topics in different kinds of Web services. We use entity extraction and categorization to create fingerprints that allow for meaningful comparison. As a basis taxonomy, we use the 23 main categories of Wikipedia Category Graph, which has been assembled over the years by the wisdom of the crowds. Following a proof of concept of our approach, we analyze differences in topic coverage and topic impact. The results show many differences between Web services like Twitter, Flickr and Delicious, which reflect users' behavior and the usage of each system. The paper concludes with a user study that demonstrates the benefits of fingerprints over traditional textual methods for recommendations of heterogeneous resources. 0 0
Exploratory search with semantic transformations using collaborative knowledge bases Yegin Genc Collaborative knowledge bases
Concept networks
Exploratory search
Wikipedia
WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining English 2014 Sometimes we search for simple facts. Other times we search for relationships between concepts. While existing information retrieval systems work well for simple searches, they are less satisfying for complex inquiries because of the ill-structured nature of many searches and the cognitive load involved in the search process. Search can be improved by leveraging the network of concepts that are maintained by collaborative knowledge bases such as Wikipedia. By treating exploratory search inquires as networks of concepts - and then mapping documents to these concepts, exploratory search performance can be improved. This method is applied to an exploratory search task: given a journal abstract, abstracts are ranked based their relevancy to the seed abstract. The results show comparable relevancy scores to state of the art techniques while at the same time providing better diversity. 0 0
Exploring collective DSL integration in a large situated IS: Towards comprehensive language integration in information systems Aram M.
Gustaf Neumann
DSL
Language integration
Semantic enterprise wiki
ACM International Conference Proceeding Series English 2014 In large situated information system instances, a great variety of stakeholders interact with each other via technology, constantly shaping and refining the information system. In the course of such a system's history, a range of domain-specific languages may have been incorporated. These language means are often not sufficiently integrated on an ontological level leading to syntactical and conceptual redundancies and impeding a shared understanding of the systems' functionalities. In this paper, we present our ambitions towards a language integration approach that aims at mitigating this problem. We exemplify it in the context of an existing educational information system instance. 0 0
Extracting semantic concept relations from Wikipedia Arnold P.
Rahm E.
Background knowledge
Information extraction
Natural Language Processing
Semantic relations
Thesauri
Wikipedia
ACM International Conference Proceeding Series English 2014 Background knowledge as provided by repositories such as WordNet is of critical importance for linking or mapping ontologies and related tasks. Since current repositories are quite limited in their scope and currentness, we investigate how to automatically build up improved repositories by extracting semantic relations (e.g., is-a and part-of relations) from Wikipedia articles. Our approach uses a comprehensive set of semantic patterns, finite state machines and NLP-techniques to process Wikipedia definitions and to identify semantic relations between concepts. Our approach is able to extract multiple relations from a single Wikipedia article. An evaluation for different domains shows the high quality and effectiveness of the proposed approach. 0 0
Fostering collaborative learning with wikis: Extending MediaWiki with educational features Popescu E.
Maria C.
Udristoiu A.L.
Co-writing
Collaborative learning
Educational wiki
Learner tracking
MediaWiki extensions
Lecture Notes in Computer Science English 2014 Wikis are increasingly popular Web 2.0 tools in educational settings, being used successfully for collaborative learning. However, since they were not originally conceived as educational tools, they lack some of the functionalities useful in the instructional process (such as learner monitoring, evaluation support, student group management etc.). Therefore in this paper we propose a solution to add these educational support features, as an extension to the popular MediaWiki platform. CoLearn, as it is called, is aimed at increasing the collaboration level between students, investigating also the collaborative versus cooperative learner actions. Its functionalities and pedagogical rationale are presented, together with some technical details. A set of practical guidelines for promoting collaborative learning with wikis is also included. 0 0
Graph-based domain-specific semantic relatedness from Wikipedia Sajadi A. Biomedical Domain
Semantic relatedness
Data mining
Lecture Notes in Computer Science English 2014 Human made ontologies and lexicons are promising resources for many text mining tasks in domain specific applications, but they do not exist for most domains. We study the suitability of Wikipedia as an alternative resource for ontologies regarding the Semantic Relatedness problem. We focus on the biomedical domain because (1) high quality manually curated ontologies are available and (2) successful graph based methods have been proposed for semantic relatedness in this domain. Because Wikipedia is not hierarchical and links do not convey defined semantic relationships, the same methods used on lexical resources (such as WordNet) cannot be applied here straightforwardly. Our contributions are (1) Demonstrating that Wikipedia based methods outperform state of the art ontology based methods on most of the existing ontologies in the biomedical domain (2) Adapting and evaluating the effectiveness of a group of bibliometric methods of various degrees of sophistication on Wikipedia for the first time (3) Proposing a new graph-based method that is outperforming existing methods by considering some specific features of Wikipedia structure. 0 0
Heterogeneous graph-based intent learning with queries, web pages and Wikipedia concepts Ren X.
Yafang Wang
Yu X.
Yan J.
Zheng Chen
Jangwhan Han
Heterogeneous graph clustering
Search intent
Wikipedia
WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining English 2014 The problem of learning user search intents has attracted intensive attention from both industry and academia. However, state-of-the-art intent learning algorithms suffer from different drawbacks when only using a single type of data source. For example, query text has difficulty in distinguishing ambiguous queries; search log is bias to the order of search results and users' noisy click behaviors. In this work, we for the first time leverage three types of objects, namely queries, web pages and Wikipedia concepts collaboratively for learning generic search intents and construct a heterogeneous graph to represent multiple types of relationships between them. A novel unsupervised method called heterogeneous graph-based soft-clustering is developed to derive an intent indicator for each object based on the constructed heterogeneous graph. With the proposed co-clustering method, one can enhance the quality of intent understanding by taking advantage of different types of data, which complement each other, and make the implicit intents easier to interpret with explicit knowledge from Wikipedia concepts. Experiments on two real-world datasets demonstrate the power of the proposed method where it achieves a 9.25% improvement in terms of NDCG on search ranking task and a 4.67% enhancement in terms of Rand index on object co-clustering task compared to the best state-of-the-art method. 0 0
How collective intelligence emerges: Knowledge creation process in Wikipedia from microscopic viewpoint Kangpyo Lee Collective intelligence
User-paragraph network
Visaphor
Wikipedia
Proceedings of the Workshop on Advanced Visual Interfaces AVI English 2014 The Wikipedia, one of the richest human knowledge repositories on the Internet, has been developed by collective intelligence. To gain insight into Wikipedia, one asks how initial ideas emerge and develop to become a concrete article through the online collaborative process? Led by this question, the author performed a microscopic observation of the knowledge creation process on the recent article, "Fukushima Daiichi nuclear disaster." The author collected not only the revision history of the article but also investigated interactions between collaborators by making a user-paragraph network to reveal an intellectual intervention of multiple authors. The knowledge creation process on the Wikipedia article was categorized into 4 major steps and 6 phases from the beginning to the intellectual balance point where only revisions were made. To represent this phenomenon, the author developed a visaphor (digital visual metaphor) to digitally represent the article's evolving concepts and characteristics. Then the author created a dynamic digital information visualization using particle effects and network graph structures. The visaphor reveals the interaction between users and their collaborative efforts as they created and revised paragraphs and debated aspects of the article. 0 0
Identifying the topic of queries based on domain specify ontology ChienTa D.C.
Thi T.P.
Domain ontology
Identifying topic
Information extraction
WIT Transactions on Information and Communication Technologies English 2014 In order to identify the topic of queries, a large number of past researches have relied on lexicon-syntactic and handcrafted knowledge sources in Machine Learning and Natural Language Processing (NLP). Conversely, in this paper, we introduce the application system that detects the topic of queries based on domain-specific ontology. On this system, we work hard on building this domainspecific ontology, which is composed of instances automatically extracted from available resources such as Wikipedia, WordNet, and ACM Digital Library. The experimental evaluation with many cases of queries related to information technology area shows that this system considerably outperforms a matching and identifying approach. 0 0
Improving modern art articles on Wikipedia, a partnership between Wikimédia France and Centre Georges Pompidou Sylvain Machefert Museum
Crowdsourcing
Préconférence IFLA 2014 - Bibliothèques d'art French 2014 The Centre Georges Pompidou is a structure in Paris hosting the "Musée National d'Art Moderne", largest museum for modern art in Europe. Wikimédia France is a French organization working on promoting Wikipedia and other Wikimedia projects, by organizing trainings or conducting partnerships for example. The project described in this proposal has been led by the GLAM (Galleries Libraries Archives and Museums) working group of Wikimédia France and Pompidou museum curators. 3 0
Inferring attitude in online social networks based on quadratic correlation Chao Wang
Bulatov A.A.
Machine learning
Quadratic optimization
Signed Networks
Lecture Notes in Computer Science English 2014 The structure of an online social network in most cases cannot be described just by links between its members. We study online social networks, in which members may have certain attitude, positive or negative, toward each other, and so the network consists of a mixture of both positive and negative relationships. Our goal is to predict the sign of a given relationship based on the evidences provided in the current snapshot of the network. More precisely, using machine learning techniques we develop a model that after being trained on a particular network predicts the sign of an unknown or hidden link. The model uses relationships and influences from peers as evidences for the guess, however, the set of peers used is not predefined but rather learned during the training process. We use quadratic correlation between peer members to train the predictor. The model is tested on popular online datasets such as Epinions, Slashdot, and Wikipedia. In many cases it shows almost perfect prediction accuracy. Moreover, our model can also be efficiently updated as the underlying social network evolves. 0 0
Intelligent searching using delay semantic network Dvorscak S.
Machova K.
SAMI 2014 - IEEE 12th International Symposium on Applied Machine Intelligence and Informatics, Proceedings English 2014 Article introduces different way how to implement semantic search, using semantic search agent over information obtained directly from web. The paper describes time delay form of semantic network, which we have used for providing of semantic search. Using of time-delay aspect inside semantic network has positive impact in several ways. It provides way how to represent knowledges dependent on time via semantic network, but also how to optimize a process of inference. That is all realized for Wikipedia articles in the form of search engine. The core's implementation is realized in way of massive multithread inference mechanism for massive semantic network. 0 0
Investigation of information behavior in Wikipedia articles Rosch B. Eye-tracking
Information behavior
Pictorial information
Textual information
Wikipedia article
Proceedings of the 5th Information Interaction in Context Symposium, IIiX 2014 English 2014 This work aims to explore information behavior in selected Wikipedia articles. To get insights into users' interaction with pictorial and textual contents eye-tracking experiments are conducted. Spread of information within the articles and the relation between text and images are analyzed. 0 0
Kondenzer: Exploration and visualization of archived social media Alonso O.
Khandelwal K.
Proceedings - International Conference on Data Engineering English 2014 Modern social networks such as Twitter provide a platform for people to express their opinions on a variety of topics ranging from personal to global. While the factual part of this information and the opinions of various experts are archived by sources such as Wikipedia and reputable news articles, the opinion of the general public is drowned out in a sea of noise and 'un-interesting' information. In this demo we present Kondenzer - an offline system for condensing, archiving and visualizing social data. Specifically, we create digests of social data using a combination of filtering, duplicate removal and efficient clustering. This gives a condensed set of high quality data which is used to generate facets and create a collection that can be visualized using the PivotViewer control. 0 0
Large-scale author verification: Temporal and topical influences Van Dam M.
Claudia Hauff
Authorship verification
Plagiarism detection
SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2014 The task of author verification is concerned with the question whether or not someone is the author of a given piece of text. Algorithms that extract writing style features from texts are used to determine how close in style different documents are. Currently, evaluations of author verification algorithms are restricted to small-scale corpora with usually less than one hundred test cases. In this work, we present a methodology to derive a large-scale author verification corpus based on Wikipedia Talkpages. We create a corpus based on English Wikipedia which is significantly larger than existing corpora. We investigate two dimensions on this corpus which so far have not received sufficient attention: the influence of topic and the influence of time on author verification accuracy. Copyright 2014 ACM. 0 0
Learning a lexical simplifier using Wikipedia Horn C.
Manduca C.
David Kauchak
52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference English 2014 In this paper we introduce a new lexical simplification approach. We extract over 30K candidate lexical simplifications by identifying aligned words in a sentence-aligned corpus of English Wikipedia with Simple English Wikipedia. To apply these rules, we learn a feature-based ranker using SVM rank trained on a set of labeled simplifications collected using Amazon's Mechanical Turk. Using human simplifications for evaluation, we achieve a precision of 76% with changes in 86% of the examples. 0 0
Learning to compute semantic relatedness using knowledge from wikipedia Zheng C.
Zhe Wang
Bie R.
Zhou M.
Semantic relatedness
Supervised Learning
Wikipedia
Lecture Notes in Computer Science English 2014 Recently, Wikipedia has become a very important resource for computing semantic relatedness (SR) between entities. Several approaches have already been proposed to compute SR based on Wikipedia. Most of the existing approaches use certain kinds of information in Wikipedia (e.g. links, categories, and texts) and compute the SR by empirically designed measures. We have observed that these approaches produce very different results for the same entity pair in some cases. Therefore, how to select appropriate features and measures to best approximate the human judgment on SR becomes a challenging problem. In this paper, we propose a supervised learning approach for computing SR between entities based on Wikipedia. Given two entities, our approach first maps entities to articles in Wikipedia; then different kinds of features of the mapped articles are extracted from Wikipedia, which are then combined with different relatedness measures to produce nine raw SR values of the entity pair. A supervised learning algorithm is proposed to learn the optimal weights of different raw SR values. The final SR is computed as the weighted average of raw SRs. Experiments on benchmark datasets show that our approach outperforms baseline methods. 0 0
Leveraging open source tools for Web mining Pennete K.C. Data mining
Open source
R
Rapid miner
Web mining
Wikipedia
Lecture Notes in Electrical Engineering English 2014 Web mining is the most pursued research area and often the most challenging one. Using web mining, corporates and individuals alike are inquisitively pursuing to unravel the hidden knowledge underneath the diverse gargantuan volumes of web data. This paper tries to present how a researcher can leverage the colossal knowledge available in open access sites such as Wikipedia as a source of information rather than subscribing to closed networks of knowledge and use open source tools rather than prohibitively priced commercial mining tools to do web mining. The paper illustrates a step-by-step usage of R and RapidMiner in web mining to enable a novice to understand the concepts as well as apply it in real world. 0 0
Lightweight domain ontology learning from texts:Graph theory-based approach using wikipedia Ahmed K.B.
Toumouh A.
Widdows D.
Concepts' hierarchy
Graph normalisation
Lightweight domain ontologies
Ontology learning from texts
Wikipedia
International Journal of Metadata, Semantics and Ontologies English 2014 Ontology engineering is the backbone of the semantic web. However, the construction of formal ontologies is a tough exercise which requires time and heavy costs. Ontology learning is thus a solution for this requirement. Since texts are massively available everywhere, making up of experts' knowledge and their know-how, it is of great value to capture the knowledge existing within such texts. Our approach is thus the kind of research work that answers the challenge of creating concepts' hierarchies from textual data taking advantage of the Wikipedia encyclopaedia to achieve some good-quality results. This paper presents a novel approach which essentially uses plain text Wikipedia instead of its categorical system and works with a simplified algorithm to infer a domain taxonomy from a graph.© 2014 Inderscience Enterprises Ltd. 0 0
MIGSOM: A SOM algorithm for large scale hyperlinked documents inspired by neuronal migration Kotaro Nakayama
Yutaka Matsuo
Clustering
Link analysis
SOM
Visualisation
Wikipedia
Lecture Notes in Computer Science English 2014 The SOM (Self Organizing Map), one of the most popular unsupervised machine learning algorithms, maps high-dimensional vectors into low-dimensional data (usually a 2-dimensional map). The SOM is widely known as a "scalable" algorithm because of its capability to handle large numbers of records. However, it is effective only when the vectors are small and dense. Although a number of studies on making the SOM scalable have been conducted, technical issues on scalability and performance for sparse high-dimensional data such as hyperlinked documents still remain. In this paper, we introduce MIGSOM, an SOM algorithm inspired by new discovery on neuronal migration. The two major advantages of MIGSOM are its scalability for sparse high-dimensional data and its clustering visualization functionality. In this paper, we describe the algorithm and implementation in detail, and show the practicality of the algorithm in several experiments. We applied MIGSOM to not only experimental data sets but also a large scale real data set: Wikipedia's hyperlink data. 0 0
Massive query expansion by exploiting graph knowledge bases for image retrieval Guisado-Gamez J.
Dominguez-Sal D.
Larriba-Pey J.-L.
Community detection
Graph mining techniques
Information retrieval
Knowledge bases
Query expansion
Wikipedia
ICMR 2014 - Proceedings of the ACM International Conference on Multimedia Retrieval 2014 English 2014 Annotation-based techniques for image retrieval suffer from sparse and short image textual descriptions. Moreover, users are often not able to describe their needs with the most appropriate keywords. This situation is a breeding ground for a vocabulary mismatch problem resulting in poor results in terms of retrieval precision. In this paper, we propose a query expansion technique for queries expressed as keywords and short natural language descriptions. We present a new massive query expansion strategy that enriches queries using a graph knowledge base by identifying the query concepts, and adding relevant synonyms and semantically related terms. We propose a topological graph enrichment technique that analyzes the network of relations among the concepts, and suggests semantically related terms by path and community detection analysis of the knowledge graph. We perform our expansions by using two versions of Wikipedia as knowledge base achieving improvements of the system's precision up to more than 27% Copyright 2014 ACM. 0 0
Maturity assessment of Wikipedia medical articles Conti R.
Marzini E.
Spognardi A.
Matteucci I.
Mori P.
Petrocchi M.
Automatic quality evaluation
Multi-criteria decision making
Wikipedia Medicine Portal
Proceedings - IEEE Symposium on Computer-Based Medical Systems English 2014 Recent studies report that Internet users are growingly looking for health information through the Wikipedia Medicine Portal, a collaboratively edited multitude of articles with contents often comparable with professionally edited material. Automatic quality assessment of the Wikipedia medical articles has not received much attention by Academia and it presents open distinctive challenges. In this paper, we propose to tag the medical articles on the Wikipedia Medicine Portal, clearly stating their maturity degree, intended as a summarizing measure of several article properties. For this purpose, we adopt the Analytic Hierarchy Process, a well known methodology for decision making, and we evaluate the maturity degree of more than 24000 Wikipedia medical articles. The obtained results show how the qualitative analysis of medical content not always overlap with a quantitative analysis (an example of which is shown in the paper), since important properties of an article can hardly be synthesized by quantitative features. This seems particularly true when the analysis considers the concept of maturity, defined and verified in this work. 0 0
Mining hidden concepts: Using short text clustering and wikipedia knowledge Yang C.-L.
Benjamasutin N.
Chen-Burger Y.-H.
Proceedings - 2014 IEEE 28th International Conference on Advanced Information Networking and Applications Workshops, IEEE WAINA 2014 English 2014 In recent years, there has been a rapidly increasing use of social networking platforms in the forms of short-text communication. However, due to the short-length of the texts used, the precise meaning and context of these texts are often ambiguous. To address this problem, we have devised a new community mining approach that is an adaptation and extension of text clustering, using Wikipedia as background knowledge. Based on this method, we are able to achieve a high level of precision in identifying the context of communication. Using the same methods, we are also able to efficiently identify hidden concepts in Twitter texts. Using Wikipedia as background knowledge considerably improved the performance of short text clustering. 0 0
Mining the personal interests of microbloggers via exploiting wikipedia knowledge Fan M.
Zhou Q.
Zheng T.F.
Interest
Microblog
Social tagging
Wikipedia
Lecture Notes in Computer Science English 2014 This paper focuses on an emerging research topic about mining microbloggers' personalized interest tags from their own microblogs ever posted. It based on an intuition that microblogs indicate the daily interests and concerns of microblogs. Previous studies regarded the microblogs posted by one microblogger as a whole document and adopted traditional keyword extraction approaches to select high weighting nouns without considering the characteristics of microblogs. Given the less textual information of microblogs and the implicit interest expression of microbloggers, we suggest a new research framework on mining microbloggers' interests via exploiting the Wikipedia, a huge online word knowledge encyclopedia, to take up those challenges. Based on the semantic graph constructed via the Wikipedia, the proposed semantic spreading model (SSM) can discover and leverage the semantically related interest tags which do not occur in one's microblogs. According to SSM, An interest mining system have implemented and deployed on the biggest microblogging platform (Sina Weibo) in China. We have also specified a suite of new evaluation metrics to make up the shortage of evaluation functions in this research topic. Experiments conducted on a real-time dataset demonstrate that our approach outperforms the state-of-the-art methods to identify microbloggers' interests. 0 0
Monitoring teachers' complex thinking while engaging in philosophical inquiry with web 2.0 Agni Stylianou-Georgiou
Petrou A.
Andri Ioannou
Caring thinking
Complex thinking
Creative thinking
Critical thinking
Forum
Philosophical inquiry
Philosophy for children
Technology integration
Wiki
WikiSplit
Lecture Notes in Computer Science English 2014 The purpose of this study was to examine how we can exploit new technologies to scaffold and monitor the development of teachers' complex thinking while engaging in philosophical inquiry. We set up an online learning environment using wiki and forum technologies and we organized the activity in four major steps to scaffold complex thinking for the teacher participants. In this article, we present the evolution of complex thinking of one group of teachers by studying their interactions in depth. 0 0
Motivating Wiki-based collaborative learning by increasing awareness of task conflict: A design science approach Wu K.
Vassileva J.
Xiaohua Sun
Fang J.
Collaborative learning
Design
Task conflict
Wiki
Lecture Notes in Computer Science English 2014 Wiki system has been deployed in many collaborative learning projects. However, lack of motivation is a serious problem in the collaboration process. The wiki system is originally designed to hide authorship information. Such design may hinder users from being aware of task conflict, resulting in undesired outcomes (e.g. reduced motivation, suppressed knowledge exchange activities). We propose to incorporate two different tools in wiki systems to motivate learners by increasing awareness of task conflict. A field test was executed in two collaborative writing projects. The results from a wide-scale survey and a focus group study confirmed the utility of the new tools and suggested that these tools can help learners develop both extrinsic and intrinsic motivations to contribute. This study has several theoretical and practical implications, it enriched the knowledge of task conflict, proposed a new way to motivate collaborative learning, and provided a low-cost resolution to manage task conflict. 0 0
Multilinguals and wikipedia editing Hale S.A. Cross-language
Information diffusion
Information discovery
Social media
Social network analysis
Wikipedia
WebSci 2014 - Proceedings of the 2014 ACM Web Science Conference English 2014 This article analyzes one month of edits to Wikipedia in order to examine the role of users editing multiple language editions (referred to as multilingual users). Such multilingual users may serve an important function in diffusing information across different language editions of the encyclopedia, and prior work has suggested this could reduce the level of self-focus bias in each edition. This study finds multilingual users are much more active than their single-edition (monolingual) counterparts. They are found in all language editions, but smaller-sized editions with fewer users have a higher percentage of multilingual users than larger-sized editions. About a quarter of multilingual users always edit the same articles in multiple languages, while just over 40% of multilingual users edit different articles in different languages. When non-English users do edit a second language edition, that edition is most frequently English. Nonetheless, several regional and linguistic cross-editing patterns are also present. Copyright 0 0
Mutual disambiguation for entity linking Charton E.
Meurs M.-J.
Jean-Louis L.
Marie-Pierre Gagnon
52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference English 2014 The disambiguation algorithm presented in this paper is implemented in SemLinker, an entity linking system. First, named entities are linked to candidate Wikipedia pages by a generic annotation engine. Then, the algorithm re-ranks candidate links according to mutual relations between all the named entities found in the document. The evaluation is based on experiments conducted on the test corpus of the TAC-KBP 2012 entity linking task. 0 0
Myths to burst about hybrid learning Li K.C. Hybrid learning
Learner readiness
Learning effectiveness
Teacher readiness
Wiki
Lecture Notes in Computer Science English 2014 Given the snowballing attention to and growing popularity of hybrid learning, some take for granted that the learning mode means more effective education delivery while some who hold a skeptical view expect researchers to inform them whether hybrid learning leads to better learning effectiveness. Though diversified, both beliefs are like myths about the hybrid mode. By reporting findings concerning the use of wikis in a major project on hybrid courses piloted at a university in Hong Kong, this paper highlights the complexity concerning the effectiveness of a hybrid learning mode and the problems of a reductionistic view of its effectiveness. Means for elearning were blended with conventional distance learning components into four undergraduate courses. Findings show that a broad variety of factors, including subject matters, instructors' pedagogical knowledge of the teaching means, students' readiness for the new learning mode and the implementation methods, play a key role in deciding learning effectiveness, rather than just the delivery mode per se. 0 0
Named entity evolution analysis on wikipedia Holzmann H.
Risse T.
Named entity evolution
Semantics
Wikipedia
WebSci 2014 - Proceedings of the 2014 ACM Web Science Conference English 2014 Accessing Web archives raises a number of issues caused by their temporal characteristics. Additional knowledge is needed to find and understand older texts. Especially entities mentioned in texts are subject to change. Most severe in terms of information retrieval are name changes. In order to find entities that have changed their name over time, search engines need to be aware of this evolution. We tackle this problem by analyzing Wikipedia in terms of entity evolutions mentioned in articles. We present statistical data on excerpts covering name changes, which will be used to discover similar text passages and extract evolution knowledge in future work. Copyright 0 0
Okinawa in Japanese and English Wikipedia Hale S.A. Cross-language
Information diffusion
Information discovery
Social media
Wikipedia
Conference on Human Factors in Computing Systems - Proceedings English 2014 This research analyzes edits by foreign-language users in Wikipedia articles about Okinawa, Japan, in the Japanese and English editions of the encyclopedia. Okinawa, home to both English and Japanese speaking users, provides a good case to look at content differences and cross-language editing in a small geographic area on Wikipedia. Consistent with prior work, this research finds large differences in the representations of Okinawa in the content of the two editions. The number of users crossing the language boundary to edit both editions is also extremely small. When users do edit in a non-primary language, they most frequently edit articles that have cross-language (interwiki) links, articles that are edited more by other users, and articles that have more images. Finally, the possible value of edits from foreign-language users and design possibilities to motivate wider contributions from foreign-language users are discussed. 0 0
Ontology construction using multiple concept lattices Wang W.C.
Lu J.
Concept lattice
Ontology construction
Wikipedia
Wordnet
Advanced Materials Research English 2014 The paper proposes an ontology construction approach that combines Fuzzy Formal Concept Analysis, Wikipedia and WordNet in a process that constructs multiple concept lattices for sub-domains. Those sub-domains are divided from the target domain. The multiple concept lattices approach can mine concepts and determine relations between concepts automatically, and construct domain ontology accordingly. This approach is suitable for the large domain or complex domain which contains obvious sub-domains. 0 0
QuoDocs: Improving developer engagement in software documentation through gamification Sukale R.
Pfaff M.S.
Collaborative authoring
Knowledge building
Software documentation
Wiki
Conference on Human Factors in Computing Systems - Proceedings English 2014 Open source projects are created and maintained by developers who are distributed across the globe. As projects become larger, a developer's knowledge of a project's conceptual model becomes specialized. When new members join a project, it is difficult for them to understand the reasoning behind the structure and organization of the project since they do not have access to earlier discussions. We interviewed and surveyed developers from a popular open source project hosting website to find out how they maintain documentation and communicate the project details with new members. We found that documentation is largely out of sync with code and that developers do not find maintaining it to be an engaging activity. In this paper, we propose a new system - QuoDocs - and take a human-centered approach to introduce competitiveness and personalization to engage software developers in documenting their projects. 0 0
REQcollect: Requirements collection, project matching and technology transition Goldrich L.
Hamer S.
McNeil M.
Longstaff T.
Gatlin R.
Bello-Ogunu E.
Proceedings of the Annual Hawaii International Conference on System Sciences English 2014 This paper describes the evolution of REQcollect (REQuirements Collection). REQcollect was developed through several iterations of agile development and the transition of other projects. Multiple federal agencies have sponsored the work as well as transitioned the technologies into use. The parents of REQcollect are REQdb (REQuirements Database) and DART3 (Department of Homeland Security Assistant for R&D Tracking and Technology Transfer) [1]. DART3 was developed from three other projects: TPAM (Transition Planning and Assessment Model) [2], GNOSIS (Global Network Operations Survey and Information Sharing) [3,4] Aqueduct [5], a semantic MediaWiki extension. REQcollect combines the best components of these previous systems: a requirements elicitation and collection tool and a Google-like matching algorithm to identify potential transitions of R&D projects that match requirements. 0 0
Reader preferences and behavior on Wikipedia Janette Lehmann
Claudia Muller-Birn
David Laniado
Lalmas M.
Andreas Kaltenbrunner
Article quality
Editor
Engagement
Human factors
Measurement
Reader
Reading behavior
Reading interest
Wikipedia
HT 2014 - Proceedings of the 25th ACM Conference on Hypertext and Social Media English 2014 Wikipedia is a collaboratively-edited online encyclopaedia that relies on thousands of editors to both contribute articles and maintain their quality. Over the last years, research has extensively investigated this group of users while another group of Wikipedia users, the readers, their preferences and their behavior have not been much studied. This paper makes this group and its %their activities visible and valuable to Wikipedia's editor community. We carried out a study on two datasets covering a 13-months period to obtain insights on users preferences and reading behavior in Wikipedia. We show that the most read articles do not necessarily correspond to those frequently edited, suggesting some degree of non-alignment between user reading preferences and author editing preferences. We also identified that popular and often edited articles are read according to four main patterns, and that how an article is read may change over time. We illustrate how this information can provide valuable insights to Wikipedia's editor community. 0 0
Research on XML data mining model based on multi-level technology Zhu J.-X. Data mining model
Multi-level technique
World Wide Web
XML
Advanced Materials Research English 2014 The era of Web 2.0 has been coming, and more and more Web 2.0 application, such social networks and Wikipedia, have come up. As an industrial standard of the Web 2.0, the XML technique has also attracted more and more researchers. However, how to mine value information from massive XML documents is still in its infancy. In this paper, we study the basic problem of XML data mining-XML data mining model. We design a multi-level XML data mining model, propose a multi-level data mining method, and list some research issues in the implementation of XML data mining systems. 0 0
Revision graph extraction in Wikipedia based on supergram decomposition and sliding update Wu J.
Mizuho Iwaihara
Collaboration
Revision history
Wikipedia
IEICE Transactions on Information and Systems English 2014 As one of the popular social media that many people turn to in recent years, collaborative encyclopedia Wikipedia provides information in a more "Neutral Point of View" way than others. Towards this core principle, plenty of efforts have been put into collaborative contribution and editing. The trajectories of how such collaboration appears by revisions are valuable for group dynamics and social media research, which suggest that we should extract the underlying derivation relationships among revisions from chronologically-sorted revision history in a precise way. In this paper, we propose a revision graph extraction method based on supergram decomposition in the document collection of near-duplicates. The plain text of revisions would be measured by its frequency distribution of supergram, which is the variable-length token sequence that keeps the same through revisions. We show that this method can effectively perform the task than existing methods. Copyright 0 0
SCooL: A system for academic institution name normalization Jacob F.
Javed F.
Zhao M.
McNair M.
Lucene
Name Entity Recognition
School Normalization
Wikipedia
2014 International Conference on Collaboration Technologies and Systems, CTS 2014 English 2014 Named Entity Normalization involves normalizing recognized entities to a concrete, unambiguous real world entity. Within the purview of the online job posting domain, academic institution name normalization provides a beneficial opportunity for CareerBuilder (CB). Accurate and detailed normalization of academic institutions are important to perform sophisticated labor market dynamics analysis. In this paper we present and discuss the design and the implementation of sCooL, an academic institution name normalization system designed to supplant the existing manually maintained mapping system at CB. We also discuss the specific challenges that led to the design of sCooL. sCooL leverages Wikipedia to create academic institution name mappings from a school database which is created from job applicant resumes posted on our website. The mappings created are utilized to build a database which is then used for normalization. sCooL provides the flexibility to integrate mappings collected from different curated and non-curated sources. The system is able to identify malformed data and K-12 schools from universities and colleges. We conduct an extensive comparative evaluation of the semi-automated sCooL system against the existing manual mapping implementation and show that sCooL provides better coverage with improved accuracy. 0 0
Semantic full-text search with broccoli Holger Bast
Baurle F.
Buchhold B.
Haussmann E.
SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2014 We combine search in triple stores with full-text search into what we call semantic full-text search. We provide a fully functional web application that allows the incremental construction of complex queries on the English Wikipedia combined with the facts from Freebase. The user is guided by context-sensitive suggestions of matching words, instances, classes, and relations after each keystroke. We also provide a powerful API, which may be used for research tasks or as a back end, e.g., for a question answering system. Our web application and public API are available under http://broccoli.cs.uni-freiburg.de. 0 0
Sentence similarity by combining explicit semantic analysis and overlapping n-grams Vu H.H.
Villaneau J.
Said F.
Marteau P.-F.
Lecture Notes in Computer Science English 2014 We propose a similarity measure between sentences which combines a knowledge-based measure, that is a lighter version of ESA (Explicit Semantic Analysis), and a distributional measure, Rouge. We used this hybrid measure with two French domain-orientated corpora collected from the Web and we compared its similarity scores to those of human judges. In both domains, ESA and Rouge perform better when they are mixed than they do individually. Besides, using the whole Wikipedia base in ESA did not prove necessary since the best results were obtained with a low number of well selected concepts. 0 0
Shades: Expediting Kademlia's lookup process Einziger G.
Friedman R.
Kantor Y.
Lecture Notes in Computer Science English 2014 Kademlia is considered to be one of the most effective key based routing protocols. It is nowadays implemented in many file sharing peer-to-peer networks such as BitTorrent, KAD, and Gnutella. This paper introduces Shades, a combined routing/caching scheme that significantly shortens the average lookup process in Kademlia and improves its load handling. The paper also includes an extensive performance study demonstrating the benefits of Shades and compares it to other suggested alternatives using both synthetic workloads and traces from YouTube and Wikipedia. 0 0
SkWiki: A multimedia sketching system for collaborative creativity Zhao Z.
Badam S.K.
Chandrasegaran S.
Park D.G.
Elmqvist N.
Kisselburgh L.
Ramani K.
Collaborative editing
Creativity
Sketching
Wiki
Conference on Human Factors in Computing Systems - Proceedings English 2014 We present skWiki, a web application framework for collaborative creativity in digital multimedia projects, including text, hand-drawn sketches, and photographs. skWiki overcomes common drawbacks of existing wiki software by providing a rich viewer/editor architecture for all media types that is integrated into the web browser itself, thus avoiding dependence on client-side editors. Instead of files, skWiki uses the concept of paths as trajectories of persistent state over time. This model has intrinsic support for collaborative editing, including cloning, branching, and merging paths edited by multiple contributors. We demonstrate skWiki's utility using a qualitative, sketching-based user study. 0 0
Snuggle: Designing for efficient socialization and ideological critique Aaron Halfaker
Geiger R.S.
Loren Terveen
H.5.2. Information Interfaces and Presentation: Graphical user interfaces (GUI) Conference on Human Factors in Computing Systems - Proceedings English 2014 Wikipedia, the encyclopedia "anyone can edit", has become increasingly less so. Recent academic research and popular discourse illustrates the often aggressive ways newcomers are treated by veteran Wikipedians. These are complex sociotechnical issues, bound up in infrastructures based on problematic ideologies. In response, we worked with a coalition of Wikipedians to design, develop, and deploy Snuggle, a new user interface that served two critical functions: Making the work of newcomer socialization more effective, and bringing visibility to instances in which Wikipedians current practice of gatekeeping socialization breaks down. Snuggle supports positive socialization by helping mentors quickly find newcomers whose good-faith mistakes were reverted as damage. Snuggle also supports ideological critique and reflection by bringing visibility to the consequences of viewing newcomers through a lens of suspiciousness. 0 0
Social software in new product development - State of research and future research directions Rohmann S.
Heuschneider S.
Schumann M.
Literature review
New product development
Research agenda
Social network
Social software
Weblog
Wiki
20th Americas Conference on Information Systems, AMCIS 2014 English 2014 Product development becomes increasingly collaborative and knowledge-intensive in today's industry. To gain competitive advantage an effective usage of information systems in new product development (NPD) is needed. Social software applications indicate further potential for usage in NPD, the so called "Product Development 2.0", which is poorly understood in research so far. The purpose of this article is to point out the current state of research in this area by means of a literature review, after which research gaps and future research directions are identified. The results indicate that social software applications are suitable to support tasks in all phases of the NPD process, but influencing factors and effects of the identified social software usage in NPD are poorly understood so far. 0 0
Students experiences of using Wiki spaces to support collaborative learning in a blended classroom: A case of Kenyatta and KCA universities in Kenya Gitonga R.
Muuro M.
Nzuki D.
Collaborative knowledge building
Collaborative learning
Students experiences
Wiki spaces
2014 IST-Africa Conference and Exhibition, IST-Africa 2014 English 2014 Wiki spaces are simply web pages that allow users to create, edit and share each other's work. This paper shares experiences from a group of students who were using the Wiki spaces in their course work. It attempts to use collaborative knowledge building theory to evaluate the existing Wiki spaces practices in order to inform stakeholders on the power of Wiki spaces in setting students on a knowledge building trajectory. The respondents were 150 university students from Kenyatta and KCA universities in Kenya whose lecturers had created Wiki spaces for collaborative group tasks as part of their coursework during the September to December 2013 semester. More than 50% of the students found the Wiki spaces promoting the various aspects of knowledge building such as reflective learning and propagating idea diversity to be useful. This paper underscores the importance of Wiki spaces as environments for positioning today's students on a knowledge building track which is a skill set requirement for the 21st century graduate. 0 0
Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools Lopuszynski M.
Bolikowski L.
Natural Language Processing
Tagging document collections
Wikipedia
Communications in Computer and Information Science English 2014 In this work, we compare two simple methods of tagging scientific publications with labels reflecting their content. As a first source of labels Wikipedia is employed, second label set is constructed from the noun phrases occurring in the analyzed corpus. We examine the statistical properties and the effectiveness of both approaches on the dataset consisting of abstracts from 0.7 million of scientific documents deposited in the ArXiv preprint collection. We believe that obtained tags can be later on applied as useful document features in various machine learning tasks (document similarity, clustering, topic modelling, etc.). 0 0
Term impact-based web page ranking Al-Akashi F.H.
Inkpen D.
Indexing
Query expansion
Searching
Term impact
Vector space model
Web retrieval
Wikipedia anchors
ACM International Conference Proceeding Series English 2014 Indexing Web pages based on content is a crucial step in a modern search engine. A variety of methods and approaches exist to support web page rankings. In this paper, we describe a new approach for obtaining measures for Web page ranking. Unlike other recent approaches, it exploits the meta-terms extracted from the titles and urls for indexing the contents of web documents. We use the term impact to correlate each meta-term with document's content, rather than term frequency and other similar techniques. Our approach also uses the structural knowledge available in Wikipedia for making better expansion and formulation for the queries. Evaluation with automatic metrics provided by TREC reveals that our approach is effective for building the index and for retrieval. We present retrieval results from the ClueWeb collection, for a set of test queries, for two tasks: for an adhoc retrieval task and for a diversity task (which aims at retrieving relevant pages that cover different aspects of the queries). 0 0
The economics of contribution in a large enterprise-scale wiki Paul C.L.
Cook K.
Burtner R.
Collaboration
Contribution
Enterprise
Wiki
English 2014 The goal of our research was to understand how knowledge workers use community-curated knowledge and collaboration tools in a large organization. In our study, we explored wiki use among knowledge workers in their day-to-day responsibilities. In this poster, we examine the motivation and rewards for knowledge workers to participate in wikis through the economic idea of costs to contribute. 0 0
The impact of semantic document expansion on cluster-based fusion for microblog search Liang S.
Ren Z.
Maarten de Rijke
Lecture Notes in Computer Science English 2014 Searching microblog posts, with their limited length and creative language usage, is challenging. We frame the microblog search problem as a data fusion problem. We examine the effectiveness of a recent cluster-based fusion method on the task of retrieving microblog posts. We find that in the optimal setting the contribution of the clustering information is very limited, which we hypothesize to be due to the limited length of microblog posts. To increase the contribution of the clustering information in cluster-based fusion, we integrate semantic document expansion as a preprocessing step. We enrich the content of microblog posts appearing in the lists to be fused by Wikipedia articles, based on which clusters are created. We verify the effectiveness of our combined document expansion plus fusion method by making comparisons with microblog search algorithms and other fusion methods. 0 0
The last click: Why users give up information network navigation Scaria A.T.
Philip R.M.
Robert West
Leskovec J.
Abandonment
Browsing
Information networks
Navigation
Wikipedia
Wikispeedia
WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining English 2014 An important part of finding information online involves clicking from page to page until an information need is fully satisfied. This is a complex task that can easily be frustrating and force users to give up prematurely. An empirical analysis of what makes users abandon click-based navigation tasks is hard, since most passively collected browsing logs do not specify the exact target page that a user was trying to reach. We propose to overcome this problem by using data collected via Wikispeedia, a Wikipedia-based human-computation game, in which users are asked to navigate from a start page to an explicitly given target page (both Wikipedia articles) by only tracing hyperlinks between Wikipedia articles. Our contributions are two-fold. First, by analyzing the differences between successful and abandoned navigation paths, we aim to understand what types of behavior are indicative of users giving up their navigation task. We also investigate how users make use of back clicks during their navigation. We find that users prefer backtracking to high-degree nodes that serve as landmarks and hubs for exploring the network of pages. Second, based on our analysis, we build statistical models for predicting whether a user will finish or abandon a navigation task, and if the next action will be a back click. Being able to predict these events is important as it can potentially help us design more human-friendly browsing interfaces and retain users who would otherwise have given up navigating a website. 0 0
Tibetan-Chinese named entity extraction based on comparable corpus Sun Y.
Zhao Q.
Comparable corpus
Sequence intersection
Tibetan-Chinese named entity
Wikipedia
Applied Mechanics and Materials English 2014 Tibetan-Chinese named entity extraction is the foundation of Tibetan-Chinese information processing, which provides the basis for machine translation and cross-language information retrieval research. We used the multi-language links of Wikipedia to obtain Tibetan-Chinese comparable corpus, and combined sentence length, word matching and entity boundary words together to carry out sentence alignment. Then we extracted Tibetan-Chinese named entity from the aligned comparable corpus in three ways: (1) Natural labeling information extraction. (2) The links of Tibetan entries and Chinese entries extraction. (3) The method of sequence intersection. It contained taking the sentence as words sequence, recognizing Chinese named entity from Chinese sentences and intersecting aligned Tibetan sentences. Fianlly, through the experiment, the results prove the extraction method based on comparable corpus is effective. 0 0
Title named entity recognition using wikipedia and abbreviation generation Park Y.
Kang S.
Seo J.
Abbreviation generation
Conditional random field
Title named entity
Wikipedia
2014 International Conference on Big Data and Smart Computing, BIGCOMP 2014 English 2014 In this paper, we propose a title named entity recognition model using Wikipedia and abbreviation generation. The proposed title named entity recognition model automatically extracts title named entities from Wikipedia so constant renewal is possible without additional costs. Also, in order to establish a dictionary of title named entity abbreviations, generation rules are used to generate abbreviation candidates and abbreviations are selected through web search methods. In this paper, we propose a statistical model that recognizes title named entities using CRFs (Conditional Random Fields). The proposed model uses lexical information, a named entity dictionary, and an abbreviation dictionary, and provides title named entity recognition performance of 82.1% according to experimental results. 0 0
Towards automatic building of learning pathways Siehndel P.
Kawase R.
Nunes B.P.
Herder E.
Digital libraries
Learning pathways
Learning support
WEBIST 2014 - Proceedings of the 10th International Conference on Web Information Systems and Technologies English 2014 Learning material usually has a logical structure, with a beginning and an end, and lectures or sections that build upon one another. However, in informal Web-based learning this may not be the case. In this paper, we present a method for automatically calculating a tentative order in which objects should be learned based on the estimated complexity of their contents. Thus, the proposed method is based on a process that enriches textual objects with links to Wikipedia articles, which are used to calculate a complexity score for each object. We evaluated our method with two different datasets: Wikipedia articles and online learning courses. For Wikipedia data we achieved correlations between the ground truth and the predicted order of up to 0.57 while for subtopics inside the online learning courses we achieved correlations of 0.793. 0 0
Tracking topics on revision graphs of wikipedia edit history Li B.
Wu J.
Mizuho Iwaihara
Edit history
Supergram
Topic summarization
Wikipedia
Lecture Notes in Computer Science English 2014 Wikipedia is known as the largest online encyclopedia, in which articles are constantly contributed and edited by users. Past revisions of articles after edits are also accessible from the public for confirming the edit process. However, the degree of similarity between revisions is very high, making it difficult to generate summaries for these small changes from revision graphs of Wikipedia edit history. In this paper, we propose an approach to give a concise summary to a given scope of revisions, by utilizing supergrams, which are consecutive unchanged term sequences. 0 0
Trendspedia: An Internet observatory for analyzing and visualizing the evolving web Kang W.
Tung A.K.H.
Chen W.
Li X.
Song Q.
Zhang C.
Fei Zhao
Xiaofeng Zhou
Proceedings - International Conference on Data Engineering English 2014 The popularity of social media services has been innovating the way of information acquisition in modern society. Meanwhile, mass information is generated in every single day. To extract useful knowledge, much effort has been invested in analyzing social media contents, e.g., (emerging) topic discovery. With these findings, however, users may still find it hard to obtain knowledge of great interest in conformity with their preference. In this paper, we present a novel system which brings proper context to continuously incoming social media contents, such that mass information can be indexed, organized and analyzed around Wikipedia entities. Four data analytics tools are employed in the system. Three of them aim to enrich each Wikipedia entity by analyzing the relevant contents while the other one builds an information network among the most relevant Wikipedia entities. With our system, users can easily pinpoint valuable information and knowledge they are interested in, as well as navigate to other closely related entities through the information network for further exploration. 0 0
TripBuilder: A tool for recommending sightseeing tours Brilhante I.
MacEdo J.A.
Nardini F.M.
Perego R.
Renso C.
Lecture Notes in Computer Science English 2014 We propose TripBuilder, an user-friendly and interactive system for planning a time-budgeted sightseeing tour of a city on the basis of the points of interest and the patterns of movements of tourists mined from user-contributed data. The knowledge needed to build the recommendation model is entirely extracted in an unsupervised way from two popular collaborative platforms: Wikipedia and Flickr. TripBuilder interacts with the user by means of a friendly Web interface that allows her to easily specify personal interests and time budget. The sightseeing tour proposed can be then explored and modified. We present the main components composing the system. 0 0
Trust, but verify: Predicting contribution quality for knowledge base construction and curation Tan C.H.
Agichtein E.
Ipeirotis P.
Evgeniy Gabrilovich
Crowdsourcing
Knowledge base construction
Predicting contribution quality
WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining English 2014 The largest publicly available knowledge repositories, such as Wikipedia and Freebase, owe their existence and growth to volunteer contributors around the globe. While the majority of contributions are correct, errors can still creep in, due to editors' carelessness, misunderstanding of the schema, malice, or even lack of accepted ground truth. If left undetected, inaccuracies often degrade the experience of users and the performance of applications that rely on these knowledge repositories. We present a new method, CQUAL, for automatically predicting the quality of contributions submitted to a knowledge base. Significantly expanding upon previous work, our method holistically exploits a variety of signals, including the user's domains of expertise as reflected in her prior contribution history, and the historical accuracy rates of different types of facts. In a large-scale human evaluation, our method exhibits precision of 91% at 80% recall. Our model verifies whether a contribution is correct immediately after it is submitted, significantly alleviating the need for post-submission human reviewing. 0 0
Twelve years of wikipedia research Judit Bar-Ilan
Noa Aharony
Analysis
Longitudinal trends
Wikipedia
WebSci 2014 - Proceedings of the 2014 ACM Web Science Conference English 2014 Wikipedia was formally launched in 2001, but the first research papers mentioning it appeared only in 2002. Since then it raised a huge amount of interest in the research community. At first mainly the content creation processes and the quality of the content were studied, but later on it was picked up as a valuable source for data mining and for testing. In this paper we present preliminary results that characterize the research done on and using Wikipedia since 2002. Copyright 0 0
Two is bigger (and better) than one: The wikipedia bitaxonomy project Flati T.
Vannella D.
Pasini T.
Roberto Navigli
52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference English 2014 We present WiBi, an approach to the automatic creation of a bitaxonomy for Wikipedia, that is, an integrated taxonomy of Wikipage pages and categories. We leverage the information available in either one of the taxonomies to reinforce the creation of the other taxonomy. Our experiments show higher quality and coverage than state-of-the-art resources like DBpedia, YAGO, MENTA, WikiNet and WikiTaxonomy. 0 0
User interest profile identification using Wikipedia knowledge database Hua Li
Lai L.
Xu X.
Shen Y.
Xia C.
Family similarity
URL decay model
User profile
Web page Classification
Wikipedia knowledge network
Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013 English 2014 The interesting, targeted, relevant advertisement is considered as one of the most honest proceeds for personalizing recommendation. Topic identification is the most important technique for the unstructured web pages. Conventional content classification approaches based on bag of words are difficult to process massive web pages. In this paper, Wikipedia Category Network (WCN) nodes are used to identify a web page topic and estimate user's interest profile. Wikipedia is the largest contents knowledge database and updated dynamically. A basic interest data set is marked for WCN. The topic characterization for each WCN node is generated with the depth and breadth of the interest data set. To reduce the deviation of the breadth, a family generation algorithm is proposed to estimate the generation weight in WCN. Finally, an interest decay model based on URL number is proposed to represent user's interest profile in time period. Experimental results illustrated that the performance of Web page topic identification is significant using WCN with family model, and the profile identification model has a dynamical performance for active users. 0 0
User interests identification on Twitter using a hierarchical knowledge base Kapanipathi P.
Jain P.
Venkataramani C.
Sheth A.
Hierarchical Interest Graph
Personalization
Semantics
Social Web
Twitter
User Profiles
Wikipedia
Lecture Notes in Computer Science English 2014 Twitter, due to its massive growth as a social networking platform, has been in focus for the analysis of its user generated content for personalization and recommendation tasks. A common challenge across these tasks is identifying user interests from tweets. Semantic enrichment of Twitter posts, to determine user interests, has been an active area of research in the recent past. These approaches typically use available public knowledge-bases (such as Wikipedia) to spot entities and create entity-based user profiles. However, exploitation of such knowledge-bases to create richer user profiles is yet to be explored. In this work, we leverage hierarchical relationships present in knowledge-bases to infer user interests expressed as a Hierarchical Interest Graph. We argue that the hierarchical semantics of concepts can enhance existing systems to personalize or recommend items based on a varied level of conceptual abstractness. We demonstrate the effectiveness of our approach through a user study which shows an average of approximately eight of the top ten weighted hierarchical interests in the graph being relevant to a user's interests. 0 0
Using linked data to mine RDF from Wikipedia's tables Munoz E.
Hogan A.
Mileo A.
Data mining
Linked data
Web tables
Wikipedia
WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining English 2014 The tables embedded in Wikipedia articles contain rich, semi-structured encyclopaedic content. However, the cumulative content of these tables cannot be queried against. We thus propose methods to recover the semantics of Wikipedia tables and, in particular, to extract facts from them in the form of RDF triples. Our core method uses an existing Linked Data knowledge-base to find pre-existing relations between entities in Wikipedia tables, suggesting the same relations as holding for other entities in analogous columns on different rows. We find that such an approach extracts RDF triples from Wikipedia's tables at a raw precision of 40%. To improve the raw precision, we define a set of features for extracted triples that are tracked during the extraction phase. Using a manually labelled gold standard, we then test a variety of machine learning methods for classifying correct/incorrect triples. One such method extracts 7.9 million unique and novel RDF triples from over one million Wikipedia tables at an estimated precision of 81.5%. 0 0
Utilizing semantic Wiki technology for intelligence analysis at the tactical edge Little E. Big Data
Ontology
Semantic wiki
Soft Fusion
Tactical Edge
Triple Store
Proceedings of SPIE - The International Society for Optical Engineering English 2014 Challenges exist for intelligence analysts to efficiently and accurately process large amounts of data collected from a myriad of available data sources. These challenges are even more evident for analysts who must operate within small military units at the tactical edge. In such environments, decisions must be made quickly without guaranteed access to the kinds of large-scale data sources available to analysts working at intelligence agencies. Improved technologies must be provided to analysts at the tactical edge to make informed, reliable decisions, since this is often a critical collection point for important intelligence data. To aid tactical edge users, new types of intelligent, automated technology interfaces are required to allow them to rapidly explore information associated with the intersection of hard and soft data fusion, such as multi-INT signals, semantic models, social network data, and natural language processing of text. Abilities to fuse these types of data is paramount to providing decision superiority. For these types of applications, we have developed BLADE. BLADE allows users to dynamically add, delete and link data via a semantic wiki, allowing for improved interaction between different users. Analysts can see information updates in near-real-time due to a common underlying set of semantic models operating within a triple store that allows for updates on related data points from independent users tracking different items (persons, events, locations, organizations, etc.). The wiki can capture pictures, videos and related information. New information added directly to pages is automatically updated in the triple store and its provenance and pedigree is tracked over time, making that data more trustworthy and easily integrated with other users' pages. 0 0
Validating and extending semantic knowledge bases using video games with a purpose Vannella D.
Jurgens D.
Scarfini D.
Toscani D.
Roberto Navigli
52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference English 2014 Large-scale knowledge bases are important assets in NLP. Frequently, such resources are constructed through automatic mergers of complementary resources, such as WordNet and Wikipedia. However, manually validating these resources is prohibitively expensive, even when using methods such as crowdsourcing. We propose a cost-effective method of validating and extending knowledge bases using video games with a purpose. Two video games were created to validate conceptconcept and concept-image relations. In experiments comparing with crowdsourcing, we show that video game-based validation consistently leads to higher-quality annotations, even when players are not compensated. 0 0
VidWiki: Enabling the crowd to improve the legibility of online educational videos Cross A.
Bayyapunedi M.
Ravindran D.
Cutrell E.
William Thies
Crowdsourcing
Massive open online course
Online education
Video annotation
Wiki
English 2014 Videos are becoming an increasingly popular medium for communicating information, especially for online education. Recent efforts by organizations like Coursera, edX, Udacity and Khan Academy have produced thousands of educational videos with hundreds of millions of views in their attempt to make high quality teaching available to the masses. As a medium, videos are time-consuming to produce and cannot be easily modified after release. As a result, errors or problems with legibility are common. While text-based information platforms like Wikipedia have benefitted enormously from crowdsourced contributions for the creation and improvement of content, the various limitations of video hinder the collaborative editing and improvement of educational videos. To address this issue, we present VidWiki, an online platform that enables students to iteratively improve the presentation quality and content of educational videos. Through the platform, users can improve the legibility of handwriting, correct errors, or translate text in videos by overlaying typeset content such as text, shapes, equations, or images. We conducted a small user study in which 13 novice users annotated and revised Khan Academy videos. Our results suggest that with only a small investment of time on the part of viewers, it may be possible to make meaningful improvements in online educational videos. Copyright 0 0
Virtual tools and collaborative working environment in embedded system design Parkhomenko A.V.
Gladkova O.N.
Analysis of requirements
Embedded system
Project management system
Software and hardware
Virtual prototype
Wiki-system
Proceedings of 2014 11th International Conference on Remote Engineering and Virtual Instrumentation, REV 2014 English 2014 This paper has explored the existing approaches to the design of embedded systems. It is demonstrated that when creating a control system of moving objects based on microcontrollers, it is reasonable to use a hybrid approach based on the designed circuit boards and prepared specialized platforms. It allows satisfying the requirements of minimizing size, power consumption and at the same time reducing the time and labour content of the system design. The results of the development of architecture, hardware and software of the embedded system for efficient remote control of the moving platforms are presented. It also describes the collaborative working environment in which the project is created. 0 0
Virtual tutorials, Wikipedia books, and multimedia-based teaching for blended learning support in a course on algorithms and data structures Knackmuss J.
Creutzburg R.
Blended learning
M-learning systems
Mobile learning
Virtual tutorials
Wikipedia books
Proceedings of SPIE - The International Society for Optical Engineering English 2014 The aim of this paper is to describe the benefit and support of virtual tutorials, Wikipedia books and multimedia-based teaching in a course on Algorithms and Data Structures. We describe our work and experiences gained from using virtual tutorials held in Netucate iLinc sessions and the use of various multimedia and animation elements for the support of deeper understanding of the ordinary lectures held in the standard classroom on Algorithms and Data Structures for undergraduate computer sciences students. We will describe the benefits, form, style and contents of those virtual tutorials. Furthermore, we mention the advantage of Wikipedia books to support the blended learning process using modern mobile devices. Finally, we give some first statistical measures of improved student's scores after introducing this new form of teaching support. 0 0
Web 2.0 and wiki farms in the business realm: A proposal of new platform for small-sized companies Zubr V.
Bures V.
Otcenaskova T.
Platform development
Presentation
Small-sized companies
Web 2.0
Wikifarm
Vision 2020: Sustainable Growth, Economic Development, and Global Competitiveness - Proceedings of the 23rd International Business Information Management Association Conference, IBIMA 2014 English 2014 With the latest generation of Internet development, Web 2.0, the perception of "the Web of webs" has changed. The users are not only the passive "consumers" of the web content created for them, but they themselves are creators of the web pages content. The paper discusses the theoretical foundations for the creation of encyclopaedia. In particular, the wiki farm concept is defined and the advantages as well as disadvantages are mentioned. Employing the questionnaire survey, the opinions and experience with interactive web pages of the representatives of small-sized enterprises are revealed and examined. The final part of the paper includes the proposal of the wiki farm and related services based on the principle of Web 2.0 which might be employed and effectively utilised by wide range of companies. Each offered service contains detailed information, related services, images or visualisations and case studies linked to the particular service. The web content itself is generated consequently by users based on their knowledge and experience. 0 0
What makes a good team of Wikipedia editors? A preliminary statistical analysis Bukowski L.
Jankowski-Lorek M.
Jaroszewicz S.
Sydow M.
Dataset
Statistical data mining
Team quality
Wikipedia
Lecture Notes in Computer Science English 2014 The paper concerns studying the quality of teams of Wikipedia authors with statistical approach. We report preparation of a dataset containing numerous behavioural and structural attributes and its subsequent analysis and use to predict team quality. We have performed exploratory analysis using partial regression to remove the influence of attributes not related to the team itself. The analysis confirmed that the key issue significantly influencing article's quality are discussions between teem members. The second part of the paper successfully uses machine learning models to predict good articles based on features of the teams that created them. 0 0
Wiki as a knowledge management tool at the Multicultural school of Athens Kalagiakos P.
Koumpouros I.
Dependence pedagogy
Multicultural education
Multicultural wiki
Reusability
Reusability quality assurance group
IEEE Global Engineering Education Conference, EDUCON English 2014 The Multicultural school of Athens is a rich source of data and knowledge. Wiki is a part of a collection of software tools aiming to increase community collaboration and provide reusable content within our curriculum. Dependence Pedagogy has been proved a valuable approach and our wiki solution presented here contributes to the establishment of the reusability notion as a prerequisite of a successful Dependence Pedagogy environment. 0 0
Wiki tools in teaching English for Specific (Academic) Purposes - Improving students' participation Felea C.
Stanca L.
Blended Learning
English for Specific (Academic) Purposes
Higher Education
Web 2.0
Wiki
Lecture Notes in Computer Science English 2014 This study is based on an on-going investigation on the impact of Web 2.0 technologies, namely a wiki-based learning environment, part of a blended approach to teaching English for Specific (Academic) Purposes for EFL undergraduate students in a Romanian university. The research aims to determine whether there are statistically significant differences between the degrees of wiki participation recorded in the first semester of two consecutive academic years, starting from the assumption that modifications in the learning environment, namely the change of location for face-to-face meetings from class to computer lab setting and the introduction of more complex individual page templates may lead to increased wiki participation. Due to the project's multiple dimensions, out of which participation and response to the new online environment are particularly important, the results provide information necessary for further decisions regarding specific instructional design needs and wiki components, and changes affecting the teaching/learning process. 0 0
Wiki-mediated collaborative writing in teacher education: Assessing three years of experiences and influencing factors Hadjerrouit S. Action category
Collaborative learning
Collaborative authoring
MediaWiki
Taxonomy
Wiki
CSEDU 2014 - Proceedings of the 6th International Conference on Computer Supported Education English 2014 Wikis have been reported as tools that promote collaborative writing in educational settings. Examples of wikis in teacher education are group projects, glossary creation, teacher evaluation, and document review. However, in spite of studies that report on successful stories, the claim that wikis support collaborative writing has not yet been firmly confirmed in real educational settings. Most studies are limited to participants' subjective perceptions, and do not take into account influencing factors, or the relationships between wikis and the learning environment. In this paper, students' collaborative writing activities over a period of three years are investigated using a taxonomy of action categories and the wiki data log that tracks all students' actions. The paper analyses the level of contribution of each member of student groups, the types of actions that the groups carried out on the wikis, and the timing of contribution. The article also discusses personal and contextual factors that may influence collaborative writing activities in teacher education, and recommendations for students as well. 0 0
WikiNEXT: A wiki for exploiting the web of data Arapov P.
Michel Buffa
Othmane A.B.
Knowledge management
Semantic web
Semantic Wikis
Web Applications
Web2.0
Wiki
Proceedings of the ACM Symposium on Applied Computing English 2014 This paper presents WikiNEXT, a semantic application wiki. WikiNEXT lies on the border between application wikis and modern web based IDEs (Integrated Development Environments) like jsbin.com, jsfiddle.net, cloud9ide.com, etc. It has been initially created for writing documents that integrate data from external data sources of the web of data, such as DBPedia.org or FreeBase.com, or for writing interactive tutorials (e.g. an HTML5 tutorial, a semantic web programming tutorial) that mix text and interactive examples in the same page. The system combines some powerful aspects from (i) wikis, such as ease of use, collaboration and openness, (ii) semantic web/wikis such as making information processable by machines and (iii) web-based IDEs such as instant development and code testing in a web browser. WikiNEXT is for writing documents/pages as well as for writing web applications that manipulate semantic data, either locally or coming from the web of data. These applications can be created, edited or cloned in the browser and can be used for integrating data visualizations in wiki pages, for annotating content with metadata, or for any kind of processing. WikiNEXT is particularly suited for teaching web technologies or for writing documents that integrate data from the web of data. Copyright 2014 ACM. 0 0
WikiReviz: An edit history visualization for wiki systems Wu J.
Mizuho Iwaihara
Mass Collaboration
Visualisation
Wikipedia
Lecture Notes in Computer Science English 2014 Wikipedia maintains a linear record of edit history with article content and meta-information for each article, which conceals precious information on how each article has evolved. This demo describes the motivation and features of WikiReviz, a visualization system for analyzing edit history in Wikipedia and other Wiki systems. From the official exported edit history of a single Wikipedia article, WikiReviz reconstructs the derivation relationships among revisions precisely and efficiently by revision graph extraction and indicate meaningful article evolution progress by edit summarization. 0 0
WikiWho: Precise and Efficient Attribution of Authorship of Revisioned Content Fabian Flöck
Maribel Acosta
Wikipedia
Version control
Content modeling
Community- driven content creation
Collaborative authoring
Online collaboration
Authorship
World Wide Web Conference 2014 English 2014 Revisioned text content is present in numerous collaboration platforms on the Web, most notably Wikis. To track authorship of text tokens in such systems has many potential applications; the identification of main authors for licensing reasons or tracing collaborative writing patterns over time, to name some. In this context, two main challenges arise. First, it is critical for such an authorship tracking system to be precise in its attributions, to be reliable for further processing. Second, it has to run efficiently even on very large datasets, such as Wikipedia. As a solution, we propose a graph-based model to represent revisioned content and an algorithm over this model that tackles both issues effectively. We describe the optimal implementation and design choices when tuning it to a Wiki environment. We further present a gold standard of 240 tokens from English Wikipedia articles annotated with their origin. This gold standard was created manually and confirmed by multiple independent users of a crowdsourcing platform. It is the first gold standard of this kind and quality and our solution achieves an average of 95% precision on this data set. We also perform a first-ever precision evaluation of the state-of-the-art algorithm for the task, exceeding it by over 10% on average. Our approach outperforms the execution time of the state-of-the-art by one order of magnitude, as we demonstrate on a sample of over 240 English Wikipedia articles. We argue that the increased size of an optional materialization of our results by about 10% compared to the baseline is a favorable trade-off, given the large advantage in runtime performance. 0 0
Wikimantic: Toward effective disambiguation and expansion of queries Boston C.
Fang H.
Carberry S.
Wu H.
Xiaojiang Liu
Disambiguation
Query expansion
Search queries
Data and Knowledge Engineering English 2014 This paper presents an implemented and evaluated methodology for disambiguating terms in search queries and for augmenting queries with expansion terms. By exploiting Wikipedia articles and their reference relations, our method is able to disambiguate terms in particularly short queries with few context words and to effectively expand queries for retrieval of short documents such as tweets. Our strategy can determine when a sequence of words should be treated as a single entity rather than as a sequence of individual entities. This work is part of a larger project to retrieve information graphics in response to user queries. © 2013 Elsevier B.V. 0 0
Wikipedia-based Kernels for dialogue topic tracking Soo-Hwan Kim
Banchs R.E.
Hua Li
Dialogue Topic Tracking
Kernel Methods
Spoken Dialogue Systems
Wikipedia
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings English 2014 Dialogue topic tracking aims to segment on-going dialogues into topically coherent sub-dialogues and predict the topic category for each next segment. This paper proposes a kernel method for dialogue topic tracking to utilize various types of information obtained from Wikipedia. The experimental results show that our proposed approach can significantly improve the performances of the task in mixed-initiative humanhuman dialogues. 0 0
Wikipedia-based query performance prediction Gilad Katz
Shtok A.
Kurland O.
Bracha Shapira
Lior Rokach
Query-performance prediction
Wikipedia
SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2014 The query-performance prediction task is to estimate retrieval effectiveness with no relevance judgments. Pre-retrieval prediction methods operate prior to retrieval time. Hence, these predictors are often based on analyzing the query and the corpus upon which retrieval is performed. We propose a corpus-independent approach to preretrieval prediction which relies on information extracted from Wikipedia. Specifically, we present Wikipedia-based features that can attest to the effectiveness of retrieval performed in response to a query regardless of the corpus upon which search is performed. Empirical evaluation demonstrates the merits of our approach. As a case in point, integrating the Wikipedia- based features with state-of-the-art pre-retrieval predictors that analyze the corpus yields prediction quality that is consistently better than that of using the latter alone. Copyright 2014 ACM. 0 0
X-REC: Cross-category entity recommendation Milchevski D.
Berberich K.
Entity recommendation
Open data
Wikipedia
Proceedings of the 5th Information Interaction in Context Symposium, IIiX 2014 English 2014 We demonstrate X-Rec, a novel system for entity recommendation. In contrast to other systems, X-Rec can recommend entities from diverse categories including goods (e.g., books), other physical entities (e.g., actors), but also immaterial entities (e.g., ideologies). Further, it does so only based on publicly available data sources, including the revision history of Wikipedia, using an easily extensible approach for recommending entities. We describe X-Rec's architecture, showing how its components interact with each other. Moreover, we outline our demonstration, which foresees different modes for users to interact with the system. 0 0
Youth web spaces: Design requirements, framework, and architecture of wikis to promote youth well being Shahper Vodanovich
David Sundaram
Max Rohde
Dong J.
Architecture
Framework
Wiki
Youth - Well-being
ECIS 2014 Proceedings - 22nd European Conference on Information Systems English 2014 Youth is a period of rapid emotional, physical and intellectual change, where young people progress from being dependent children to independent adults. Young people who are unable to make this transition smoothly can face significant difficulties in both the short and long term. Although the vast majority of young people are able to find all the resources they need for their health, well-being and development within their families and living environments, some young people have difficulty in locating resources that can help them and moreover, difficulty in integrating into society. One way to support this transition is to create an environment that enables youth to be well supported through the provision of information and the creation of a community where youth feel empowered to collaborate with their peers as well as decision makers and legislators. This article focuses on the exploring the use of the Internet by youth and how youth well-being can be improved through the design of a youth-friendly web space. This article begins with a definition of youth well-being and what this means in the context of the Web. We propose key requirements for the design of youth web spaces that will result in their well-being. We use these requirements to analyse existing web spaces and conclude with the problems and issues that need to be addressed. These problems, issues and requirements then motivate us to propose a framework and architecture for the design and implementation of Wikis for enhancing youth well-being. 0 0
Demonstration of a Loosely Coupled M2M System Using Arduino, Android and Wiki Software Takashi Yamanoue
Kentaro Oda
Koichi Shimozono
Sensor network
Social network
Wiki
Java
API
Message oriented middleware
The 38th IEEE Conference on Local Computer Networks (LCN) English 22 October 2013 A Machine-to-Machine (M2M) system, in which terminals are loosely coupled with Wiki software, is proposed. This system acquires sensor data from remote terminals, processes the data by remote terminals and controls actuators at remote terminals according to the processed data. The data is passed between terminals using wiki pages. Each terminal consists of an Android terminal and an Arduino board. The mobile terminal can be controlled by a series of commands which is written on a wiki page. The mobile terminal has a data processor and the series of commands may have a program which controls the processor. The mobile terminal can read data from not only the sensors of the terminal but also wiki pages on the Internet. The input data may be processed by the data processor of the terminal. The processed data may be sent to a wiki page. The mobile terminal can control the actuators of the terminal by reading commands on the wiki page or by running the program on the wiki page. This system realizes an open communication forum for not only people but also for machines. 8 0
An Inter-Wiki Page Data Processor for a M2M System Takashi Yamanoue
Kentaro Oda
Koichi Shimozono
IIAI ESKM English September 2013 A data processor, which inputs data from wiki pages, processes the data, and outputs the processed data on a wiki page, is proposed. This data processor is designed for a Machine-to-Machine (M2M) system, which uses Arduino, Android, and Wiki software. This processor is controlled by the program which is written on a wiki page. This M2M system consists of mobile terminals and web sites with wiki software. A mobile terminal of the system consists of an Android terminal and it may have an Arduino board with sensors and actuators. The mobile terminal can read data from not only the sensors in the Arduino board but also wiki pages on the Internet. The input data may be processed by the data processor of this paper. The processed data may be sent to a wiki page. The mobile terminal can control the actuators of the Arduino board by reading commands on the wiki page or by running the program of the processor. This system realizes an open communication forum for not only people but also for machines. 2 0
Art History on Wikipedia, a Macroscopic Observation Doron Goldfarb
Max Arends
Josef Froschauer
Dieter Merkl
ArXiv English 20 April 2013 How are articles about art historical actors interlinked within Wikipedia? Lead by this question, we seek an overview on the link structure of a domain specific subset of Wikipedia articles. We use an established domain-specific person name authority, the Getty Union List of Artist Names (ULAN), in order to externally identify relevant actors. Besides containing consistent biographical person data, this database also provides associative relationships between its person records, serving as a reference link structure for comparison. As a first step, we use mappings between the ULAN and English Dbpedia provided by the Virtual Internet Authority File (VIAF). This way, we are able to identify 18,002 relevant person articles. Examining the link structure between these resources reveals interesting insight about the high level structure of art historical knowledge as it is represented on Wikipedia. 4 1
2012 - A year of Ginev D.
Miller B.R.
Lecture Notes in Computer Science English 2013 a to XML converter, is being used in a wide range of MKM applications. In this paper, we present a progress report for the 2012 calendar year. Noteworthy enhancements include: increased coverage such as Wikipedia syntax; enhanced capabilities such as embeddable JavaScript and CSS resources and RDFa support; a web service for remote processing via web-sockets; along with general accuracy and reliability improvements. The outlook for an 0.8.0 release in mid-2013 is also discussed. 0 0
A Wiki collaborative application for teaching in manufacturing engineering Cuesta E.
Sanchez-Lasheras F.
Alvarez B.J.
Gonzalez-Madruga D.
Collaborative work
Manufacturing engineering
Wiki
Wiki learning
Materials Science Forum English 2013 The interest of the present work is focused on the improvement of the students learning process through the use of a Wiki-like platform. In our research The Wiki was intended as a mean in order to make easier the learning project. During the academic year 2011/2012 the Area of Manufacturing Engineering of the University of Oviedo was involved in a project which aim was the creation of a Wiki. Nowadays this software is used as auxiliary material for other subjects that are given by the Manufacturing Engineering Area in those new Engineering degrees that have been created in order to adapt the studies to the requirement of the European Higher Education Area (EHEA). According to the results obtained by the students, it can be stated that the higher the mark of the student's Wiki the better his/her mark in the exam is. 0 0
A Wikipedia based hybrid ranking method for taxonomic relation extraction Zhong X. Hybrid ranking method
Select best position
Taxonomic relation extraction
Wikipedia
Lecture Notes in Computer Science English 2013 This paper proposes a hybrid ranking method for taxonomic relation extraction (or select best position) in an existing taxonomy. This method is capable of effectively combining two resources, an existing taxonomy and Wikipedia, in order to select a most appropriate position for a term candidate in the existing taxonomy. Previous methods mainly focus on complex inference methods to select the best position among all the possible position in the taxonomy. In contrast, our algorithm, a simple but effective one, leverage two kinds of information, the expression of and the ranking information of a term candidate, to select the best position for the term candidate (the hypernym of the term candidate in the existing taxonomy). We conduct our approach on the agricultural domain and the experimental result indicates that the performances are significantly improved. 0 0
A Wikipédia como diálogo entre universidade e sociedade: uma experiência em extensão universitária Juliana Bastos Marques
Otavio Saraiva Louvem
Anais do XIX Workshop de Informática na Escola Portuguese
List of publications in Portuguese
2013 Resumo.

O artigo apresenta uma experiência no trabalho com o uso crítico e edição de artigos da Wikipédia lusófona no ambiente universitário, em atividades de extensão, realizado na Universidade Federal do Estado do Rio de Janeiro (Unirio) em 2012. Foram realizados diferentes tipos de atividades, desde workshops de 4h até cursos de maior duração, tanto para o público adulto geral quanto para universitários segmentados por área de estudo. O objetivo do trabalho foi exercitar competências críticas de leitura e produção de textos de divulgação, trazendo e adaptando para o usuário da Wikipédia conhecimentos ensinados em nível de graduação e pós-graduação.

Abstract

The paper presents an experience with critical reading and edition of Portuguese Wikipedia articles in the university, in extension activities, conducted at the Federal University of Rio de Janeiro State (Unirio), in 2012. Different types of activities were introduced, from 4h workshops to longer term courses, for both broader audiences and university students by field of study. The goal of the activities was to exercise critical proficiency in reading and writing skills, offering and adapting for the regular Wikipedia user academic knowledge produced in undergraduate and graduate levels.
4 0
A bookmark recommender system based on social bookmarking services and wikipedia categories Yoshida T.
Inoue U.
Algorithm
Folksonomy
Recommender
Tag
SNPD 2013 - 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing English 2013 Social book marking services allow users to add bookmarks of web pages with freely chosen keywords as tags. Personalized recommender systems recommend new and useful bookmarks added by other users. We propose a new method to find similar users and to select relevant bookmarks in a social book marking service. Our method is lightweight, because it uses a small set of important tags for each user to find useful bookmarks to recommend. Our method is also powerful, because it employs the Wikipedia category database to deal with the diversity of tags among users. The evaluation using the Hatena bookmark service in Japan shows that our method significantly increases the number of relevant bookmarks recommended without notable increase of irrelevant bookmarks. 0 0
A case study of a course including Wikipedia editing activity for undergraduate students Mori Y.
Egi H.
Ozawa S.
Information ethics
Online course
Project based learning
Wikipedia
Proceedings of the 21st International Conference on Computers in Education, ICCE 2013 English 2013 Editing Wikipedia can increase participants' understandings of subjects, while making valuable contributions to the information society. In this study, we designed an online course for undergraduate students that included a Wikipedia editing activity. The result of a content analysis of the term papers revealed that the suggestions made by the e-mentor and the teacher were highly supportive for the students in our case study, and it is important for Japanese students to check Wikipedia in English before making their edits in Japanese. 0 0
A collaborative multi-source intelligence working environment: A systems approach Eachus P.
Short B.
Stedmon A.W.
Brown J.
Wilson M.
Lemanski L.
Collaborative working
Intelligence analysis
Wiki
Lecture Notes in Computer Science English 2013 This research applies a systems approach to aid the understanding of collaborative working during intelligence analysis using a dedicated (Wiki) environment. The extent to which social interaction, and problem solving was facilitated by the use of the wiki, was investigated using an intelligence problem derived from the Vast 2010 challenge. This challenge requires "intelligence analysts" to work with a number of different intelligence sources in order to predict a possible terrorist attack. The study compared three types of collaborative working, face-to-face without a wiki, face-to-face with a wiki, and use of a wiki without face-to-face contact. The findings revealed that in terms of task performance the use of the wiki without face-to-face contact performed best and the wiki group with face-to-face contact performed worst. Measures of interpersonal and psychological satisfaction were highest in the face-to-face group not using a wiki and least in the face-to-face group using a wiki. Overall it was concluded that the use of wikis in collaborative working is best for task completion whereas face-to-face collaborative working without a wiki is best for interpersonal and psychological satisfaction. 0 0
A comparative study of academic and wikipedia ranking Shuai X.
Jiang Z.
Xiaojiang Liu
Bollen J.
Citation analysis
Scholar impact
Wikipedia
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries English 2013 In addition to its broad popularity Wikipedia is also widely used for scholarly purposes. Many Wikipedia pages pertain to academic papers, scholars and topics providing a rich ecology for scholarly uses. Scholarly references and mentions on Wikipedia may thus shape the \societal impact" of a certain scholarly communication item, but it is not clear whether they shape actual \academic impact". In this paper we compare the impact of papers, scholars, and topics according to two different measures, namely scholarly citations and Wikipedia mentions. Our results show that academic and Wikipedia impact are positively correlated. Papers, authors, and topics that are mentioned on Wikipedia have higher academic impact than those are not mentioned. Our findings validate the hypothesis that Wikipedia can help assess the impact of scholarly publications and underpin relevance indicators for scholarly retrieval or recommendation systems. Copyright © 2013 by the Association for Computing Machinery, Inc. (ACM). 0 0
A comparison of named entity recognition tools applied to biographical texts Atdag S.
Labatut V.
2013 2nd International Conference on Systems and Computer Science, ICSCS 2013 English 2013 Named entity recognition (NER) is a popular domain of natural language processing. For this reason, many tools exist to perform this task. Amongst other points, they differ in the processing method they rely upon, the entity types they can detect, the nature of the text they can handle, and their input/output formats. This makes it difficult for a user to select an appropriate NER tool for a specific situation. In this article, we try to answer this question in the context of biographic texts. For this matter, we first constitute a new corpus by annotating 247 Wikipedia articles. We then select 4 publicly available, well known and free for research NER tools for comparison: Stanford NER, Illinois NET, OpenCalais NER WS and Alias-i LingPipe. We apply them to our corpus, assess their performances and compare them. When considering overall performances, a clear hierarchy emerges: Stanford has the best results, followed by LingPipe, Illionois and OpenCalais. However, a more detailed evaluation performed relatively to entity types and article categories highlights the fact their performances are diversely influenced by those factors. This complementarity opens an interesting perspective regarding the combination of these individual tools in order to improve performance. 0 0
A computational approach to politeness with application to social factors Cristian Danescu-Niculescu-Mizil
Sudhof M.
Dan J.
Leskovec J.
Potts C.
ACL 2013 - 51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference English 2013 We propose a computational framework for identifying linguistic aspects of politeness. Our starting point is a new corpus of requests annotated for politeness, which we use to evaluate aspects of politeness theory and to uncover new interactions between politeness markers and context. These findings guide our construction of a classifier with domain-independent lexical and syntactic features operationalizing key components of politeness theory, such as indirection, deference, impersonalization and modality. Our classifier achieves close to human performance and is effective across domains. We use our framework to study the relationship between politeness and social power, showing that polite Wikipedia editors are more likely to achieve high status through elections, but, once elevated, they become less polite. We see a similar negative correlation between politeness and power on Stack Exchange, where users at the top of the reputation scale are less polite than those at the bottom. Finally, we apply our classifier to a preliminary analysis of politeness variation by gender and community. 0 0
A content analysis of wikiproject discussions: Toward a typology of coordination language used by virtual teams Morgan J.T.
Mcdonald D.W.
Gilbert M.
Mark Zachry
Content analysis
Coordination
Distributed collaboration
Group dynamics
Wikipedia
English 2013 Understanding the role of explicit coordination in virtual teams allows for a more meaningful understanding of how people work together online. We describe a new content analysis for classifying discussions within Wikipedia WikiProjects-voluntary, self-directed teams of editors-present preliminary findings, and discuss potential applications and future research directions. Copyright © 2012 by the Association for Computing Machinery, Inc. (ACM). 0 0
A content-context-centric approach for detecting vandalism in Wikipedia Lakshmish Ramaswamy
Tummalapenta R.S.
Li K.
Calton Pu
Collaborative online social media
Content-context
Top-ranked co-occurrence probability
Vandalism detection
WWW co-occurrence probability
Proceedings of the 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing, COLLABORATECOM 2013 English 2013 Collaborative online social media (CSM) applications such as Wikipedia have not only revolutionized the World Wide Web, but they also have had a hugely positive effect on modern free societies. Unfortunately, Wikipedia has also become target to a wide-variety of vandalism attacks. Most existing vandalism detection techniques rely upon simple textual features such as existence of abusive language or spammy words. These techniques are ineffective against sophisticated vandal edits, which often do not contain the tell-tale markers associated with vandalism. In this paper, we argue for a context-aware approach for vandalism detection. This paper proposes a content-context-aware vandalism detection framework. The main idea is to quantify how well the words contained in the edit fit into the topic and the existing content of the Wikipedia article. We present two novel metrics, called WWW co-occurrence probability and top-ranked co-occurrence probability for this purpose. We also develop efficient mechanisms for evaluating these two metrics, and machine learning-based schemes that utilize these metrics. The paper presents a range of experiments to demonstrate the effectiveness of the proposed approach. 0 0
A framework for benchmarking entity-annotation systems Cornolti M.
Paolo Ferragina
Massimiliano Ciaramita
Benchmark framework
Entity annotation
Wikipedia
WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web English 2013 In this paper we design and implement a benchmarking framework for fair and exhaustive comparison of entity-annotation systems. The framework is based upon the definition of a set of problems related to the entity-annotation task, a set of measures to evaluate systems performance, and a systematic comparative evaluation involving all publicly available datasets, containing texts of various types such as news, tweets and Web pages. Our framework is easily-extensible with novel entity annotators, datasets and evaluation measures for comparing systems, and it has been released to the public as open source1. We use this framework to perform the first extensive comparison among all available entity annotators over all available datasets, and draw many interesting conclusions upon their efficiency and effectiveness. We also draw conclusions between academic versus commercial annotators. Copyright is held by the International World Wide Web Conference Committee (IW3C2). 0 0
A framework for detecting public health trends with Twitter Parker J.
Wei Y.
Yates A.
Frieder O.
Goharian N.
Health surveillance
Item-set mining
Twitter
Wikipedia
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013 English 2013 Traditional public health surveillance requires regular clinical reports and considerable effort by health professionals to analyze data. Therefore, a low cost alternative is of great practical use. As a platform used by over 500 million users worldwide to publish their ideas about many topics, including health conditions, Twitter provides researchers the freshest source of public health conditions on a global scale. We propose a framework for tracking public health condition trends via Twitter. The basic idea is to use frequent term sets from highly purified health-related tweets as queries into a Wikipedia article index - treating the retrieval of medically-related articles as an indicator of a health-related condition. By observing fluctuations in frequent term sets and in turn medically-related articles over a series of time slices of tweets, we detect shifts in public health conditions and concerns over time. Compared to existing approaches, our framework provides a general a priori identification of emerging public health conditions rather than a specific illness (e.g., influenza) as is commonly done. Copyright 2013 ACM. 0 0
A game theoretic analysis of collaboration in Wikipedia Anand S.
Ofer Arazy
Mandayam N.B.
Oded Nov
Collaboration
Non-cooperative game
Peer production
Trustworthy collaboration
Vandalism
Wikipedia
Lecture Notes in Computer Science English 2013 Peer production projects such as Wikipedia or open-source software development allow volunteers to collectively create knowledge-based products. The inclusive nature of such projects poses difficult challenges for ensuring trustworthiness and combating vandalism. Prior studies in the area deal with descriptive aspects of peer production, failing to capture the idea that while contributors collaborate, they also compete for status in the community and for imposing their views on the product. In this paper, we investigate collaborative authoring in Wikipedia, where contributors append and overwrite previous contributions to a page. We assume that a contributor's goal is to maximize ownership of content sections, such that content owned (i.e. originated) by her survived the most recent revision of the page.We model contributors' interactions to increase their content ownership as a non-cooperative game, where a player's utility is associated with content owned and cost is a function of effort expended. Our results capture several real-life aspects of contributors interactions within peer-production projects. Namely, we show that at the Nash equilibrium there is an inverse relationship between the effort required to make a contribution and the survival of a contributor's content. In other words, majority of the content that survives is necessarily contributed by experts who expend relatively less effort than non-experts. An empirical analysis of Wikipedia articles provides support for our model's predictions. Implications for research and practice are discussed in the context of trustworthy collaboration as well as vandalism. 0 0
A generic open world named entity disambiguation approach for tweets Habib M.B.
Van Keulen M.
Named Entity Disambiguation
Social media
Twitter
IC3K 2013; KDIR 2013 - 5th International Conference on Knowledge Discovery and Information Retrieval and KMIS 2013 - 5th International Conference on Knowledge Management and Information Sharing, Proc. English 2013 Social media is a rich source of information. To make use of this information, it is sometimes required to extract and disambiguate named entities. In this paper, we focus on named entity disambiguation (NED) in twitter messages. NED in tweets is challenging in two ways. First, the limited length of Tweet makes it hard to have enough context while many disambiguation techniques depend on it. The second is that many named entities in tweets do not exist in a knowledge base (KB). We share ideas from information retrieval (IR) and NED to propose solutions for both challenges. For the first problem we make use of the gregarious nature of tweets to get enough context needed for disambiguation. For the second problem we look for an alternative home page if there is no Wikipedia page represents the entity. Given a mention, we obtain a list of Wikipedia candidates from YAGO KB in addition to top ranked pages from Google search engine. We use Support Vector Machine (SVM) to rank the candidate pages to find the best representative entities. Experiments conducted on two data sets show better disambiguation results compared with the baselines and a competitor. 0 0
A history of newswork on wikipedia Brian C. Keegan Breaking news
Current events
Journalism
Wikipedia
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 Wikipedia's coverage of current events blurs the boundaries of what it means to be an encyclopedia. Drawing on Gieyrn's concept of \boundary work", this paper explores how Wiki- pedia's response to the 9/11 attacks expanded the role of the encyclopedia to include newswork, excluded content like the 9/11 Memorial Wiki that became problematic following this expansion, and legitimized these changes through the adop- Tion of news-related policies and routines like promoting "In the News" content on the homepage. However, a second case exploring WikiNews illustrates the pitfalls of misappropriat- ing professional newswork norms as well as the challenges of sustaining online communities. These cases illuminate the social construction of new technologies as they confront the boundaries of traditional professional identities and also re- veal how newswork is changing in response to new forms of organizing enabled by these technologies. Categories and Subject Descriptors K.2 [Computing Milieux]: History of Computing; K.4.3 [Computers and Society]: Organizational ImpactsCom- puter supported collaborative work General Terms Standardization,Theory. Copyright 2010 ACM. 0 0
A hybrid method for detecting outdated information in Wikipedia infoboxes Thanh Tran
Cao T.H.
Entity Search
Information extraction
Pattern Learning
Wikipedia Update
Proceedings - 2013 RIVF International Conference on Computing and Communication Technologies: Research, Innovation, and Vision for Future, RIVF 2013 English 2013 Wikipedia has grown fast and become a major information resource for users as well as for many knowledge bases derived from it. However it is still edited manually while the world is changing rapidly. In this paper, we propose a method to detect outdated attribute values in Wikipedia infoboxes by using facts extracted from the general Web. Our proposed method extracts new information by combining pattern-based approach with entity-search-based approach to deal with the diversity of natural language presentation forms of facts on the Web. Our experimental results show that the achieved accuracies of the proposed method are 70% and 82% respectively on the chief-executive-officer attribute and the number-of-employees attribute in company infoboxes. It significantly improves the accuracy of the single pattern-based or entity-search-based method. The results also reveal the striking truth about the outdated status of Wikipedia. 0 0
A method for recommending the most appropriate expansion of acronyms using wikipedia Choi D.
Shin J.
Lee E.
Kim P.
Acronym expansion
Acronyms
Information extraction
Text mining
Wikipedia
Proceedings - 7th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, IMIS 2013 English 2013 Over the years, many researchers have been studied to detect expansions of acronyms in texts by using linguistic and syntactical approaches in order to overcome disambiguation problems. Acronym is an abbreviation formed which is composed of initial components of single or multiple words. These initial components bring huge mistakes when a machine conducts experiments to find meaning from given texts. Detecting expansions of acronyms is not a big issue now days. The problem is that a polysemous acronym. In order to solve this problem, this paper proposes a method to recommend the most related expansion of acronym through analyzing co-occurrence words by using Wikipedia. Our goal is not finding acronym definition or expansion but recommending the most appropriate expansion of given acronyms. 0 0
A multilingual and multiplatform application for medicinal plants prescription from medical symptoms Ruiz-Rico F.
Rubio-Sanchez M.-C.
Tomas D.
Vicedo J.-L.
Category ranking
Medical Subject Headings
Medicinal Plants
Text classification
Wikipedia
SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2013 This paper presents an application for medicinal plants prescription based on text classification techniques. The system receives as an input a free text describing the symptoms of a user, and retrieves a ranked list of medicinal plants related to those symptoms. In addition, a set of links to Wikipedia are also provided, enriching the information about every medicinal plant presented to the user. In order to improve the accessibility to the application, the input can be written in six different languages, adapting the results accordingly. The application interface can be accessed from different devices and platforms. 0 0
A multilingual semantic wiki based on attempto controlled english and grammatical framework Kaljurand K.
Kuhn T.
Attempto Controlled English
Controlled natural language
Grammatical Framework
Semantic wiki
Lecture Notes in Computer Science English 2013 We describe a semantic wiki system with an underlying controlled natural language grammar implemented in Grammatical Framework (GF). The grammar restricts the wiki content to a well-defined subset of Attempto Controlled English (ACE), and facilitates a precise bidirectional automatic translation between ACE and language fragments of a number of other natural languages, making the wiki content accessible multilingually. Additionally, our approach allows for automatic translation into the Web Ontology Language (OWL), which enables automatic reasoning over the wiki content. The developed wiki environment thus allows users to build, query and view OWL knowledge bases via a user-friendly multilingual natural language interface. As a further feature, the underlying multilingual grammar is integrated into the wiki and can be collaboratively edited to extend the vocabulary of the wiki or even customize its sentence structures. This work demonstrates the combination of the existing technologies of Attempto Controlled English and Grammatical Framework, and is implemented as an extension of the existing semantic wiki engine AceWiki. 0 0
A new approach for building domain-specific corpus with wikipedia Zhang X.Y.
Li X.
Ruan Z.J.
Domain-specific corpus
Kosaraju algorithm based
Multi-root method
Wikipedia
Applied Mechanics and Materials English 2013 Domain-specific corpus can be used to build domain ontology, which is used in many areas such as IR, NLP and web Mining. We propose a multi-root based method to build a domain-specific corpus making use of Wikipedia resources. First we select some top-level nodes (Wikipedia category articles) as root nodes and traverse the Wikipedia using BFS-like algorithm. After the traverse, we get a directed Wikipedia graph (Wiki-graph). Then an algorithm mainly based on Kosaraju Algorithm is proposed to remove the cycles in the Wiki-graph. Finally, topological sort algorithm is used to traverse the Wiki-graph, and ranking and filtering is done during the process. When computing a node's ranking score, the in-degree of itself and the out-degree of its parents are both considered. The experimental evaluation shows that our method could get a high-quality domain-specific corpus. 0 0
A new approach to detecting content anomalies in Wikipedia Sinanc D.
Yavanoglu U.
Artificial neural networks
Class mapping
Data mining
Open editing schemas
Web classification
Proceedings - 2013 12th International Conference on Machine Learning and Applications, ICMLA 2013 English 2013 The rapid growth of the web has caused to availability of data effective if its content is well organized. Despite the fact that Wikipedia is the biggest encyclopedia on the web, its quality is suspect due to its Open Editing Schemas (OES). In this study, zoology and botany pages are selected in English Wikipedia and their html contents are converted to text then Artificial Neural Network (ANN) is used for classification to prevent disinformation or misinformation. After the train phase, some irrelevant words added in the content about politics or terrorism in proportion to the size of the text. By the time unsuitable content is added in a page until the moderators' intervention, the proposed system realized the error via wrong categorization. The results have shown that, when words number 2% of the content is added anomaly rate begins to cross the 50% border. 0 0
A new text representation scheme combining Bag-of-Words and Bag-of-Concepts approaches for automatic text classification Alahmadi A.
Joorabchi A.
Mahdi A.E.
Bag-of-Concepts
Bag-of-Words
Text Classification
Wikipedia
2013 7th IEEE GCC Conference and Exhibition, GCC 2013 English 2013 This paper introduces a new approach to creating text representations and apply it to a standard text classification collections. The approach is based on supplementing the well-known Bag-of-Words (BOW) representational scheme with a concept-based representation that utilises Wikipedia as a knowledge base. The proposed representations are used to generate a Vector Space Model, which in turn is fed into a Support Vector Machine classifier to categorise a collection of textual documents from two publically available datasets. Experimental results for evaluating the performance of our model in comparison to using a standard BOW scheme and a concept-based scheme, as well as recently reported similar text representations that are based on augmenting the standard BOW approach with concept-based representations. 0 0
A novel map-based visualization method based on liquid modelling Biuk-Aghai R.P.
Ao W.H.
Category
Information visualization
Map
Wikipedia
ACM International Conference Proceeding Series English 2013 Many applications produce large amounts of data, and information visualization has been successfully applied to help make sense of this data. Recently geographic maps have been used as a metaphor for visualization, given that most people are familiar with reading maps, and several visualization methods based on this metaphor have been developed. In this paper we present a new visualization method that aims to improve on existing map-like visualizations. It is based on the metaphor of liquids poured onto a surface that expand outwards until they touch each other, forming larger areas. We present the design of our visualization method and an evaluation we have carried out to compare it with an existing visualization. Our new visualization has better usability, leading to higher accuracy and greater speed of task performance. 0 0
A portable multilingual medical directory by automatic categorization of wikipedia articles Ruiz-Rico F.
Rubio-Sanchez M.-C.
Tomas D.
Vicedo J.-L.
Category ranking
JQuery Mobile
Medical Subject Headings
PhoneGap
Text classification
Wikipedia
SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2013 Wikipedia has become one of the most important sources of information available all over the world. However, the categorization of Wikipedia articles is not standardized and the searches are mainly performed on keywords rather than concepts. In this paper we present an application that builds a hierarchical structure to organize all Wikipedia entries, so that medical articles can be reached from general to particular, using the well known Medical Subject Headings (MeSH) thesaurus. Moreover, the language links between articles will allow using the directory created in different languages. The final system can be packed and ported to mobile devices as a standalone offline application. 0 0
A preliminary study of Croatian language syllable networks Ban K.
Ivakic I.
Mestrovic A.
2013 36th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2013 - Proceedings English 2013 This paper presents preliminary results of Croatian syllable networks analysis. We analyzed networks of syllables generated from texts collected from the Croatian Wikipedia and Blogs. Different syllable networks are constructed in a way that each node in this network is a syllable, and links are established between two syllables if they appear together in the same word (co-occurrence network) or if they appear as neighbours in a word (neighbour network). As a main tool we use network analysis methods which provide mechanisms that can reveal new patterns in a complex language structure. We aim to show that syllable networks differ from Erdös-Renyi random networks, which may indicate that language has its own rules and self-organization structure. Furthermore, our results have been compared with other studies on syllable network of Portuguese and Chinese. The results indicate that Croatian Syllables networks exhibit certain properties of a small world networks. 0 0
A preliminary study on the effects of barnstars on wikipedia editing Lim K.H.
Anwitaman Datta
Wise M.
Barnstars
Editing behaviour
Incentive
Wikipedia
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 This paper presents a preliminary study into the awarding of barnstars among Wikipedia editors to better understand their motivations in contributing to Wikipedia articles. We crawled the talk pages of all active Wikipedia editors and retrieved 21,299 barnstars that were awarded among 14,074 editors. In particular, we found that editors do not award and receive barnstars in equal (or similar) quantities. Also, editors were more active in editing articles before awarding or receiving barnstars. Categories and Subject Descriptors H.5.3 [Group and Organization Interfaces]: Computer- supported cooperative work General Terms Measurement. Copyright 2010 ACM. 0 0
A quick tour of BabelNet 1.1 Roberto Navigli BabelNet
Knowledge acquisition
Multilingual ontologies
Semantic networks
Lecture Notes in Computer Science English 2013 In this paper we present BabelNet 1.1, a brand-new release of the largest "encyclopedic dictionary", obtained from the automatic integration of the most popular computational lexicon of English, i.e. WordNet, and the largest multilingual Web encyclopedia, i.e. Wikipedia. BabelNet 1.1 covers 6 languages and comes with a renewed Web interface, graph explorer and programmatic API. BabelNet is available online at http://www.babelnet.org. 0 0
A semantic wiki to support knowledge sharing in innovation activities Lahoud I.
Monticolo D.
Hilaire V.
Gomes S.
Innovation
Knowledge creation
Knowledge evaluation and sharing
Ontology
Semantic wiki
SPARQL
Lecture Notes in Electrical Engineering English 2013 We will present in this paper how to ensure the creation, the validation and the sharing of ideas by using a Semantic Wiki approach. We describe the system called Wiki-I which is used by engineers to allow them to formalize their ideas during the research solutions activities. Wiki-I is based on an ontology of the innovation domain which allows to structure the wiki pages and to store the knowledge posted by the engineers. In this paper, we will explain how Wiki-I ensures the reliability of the innovative ideas thanks to an idea of evaluation process. After explaining the interest of the use of semantic wikis in innovation management approach, we describe the architecture of Wiki-I with its semantic functionalities. At the end of the paper, we prove the effectiveness of Wiki-I with an ideas evaluation example in the case of students challenge for innovation. 0 0
A study of the Sudanese students' use of collaborative tools within Moodle Learning Management System Elmahadi I.
Osman I.
Computer Supported Collaborative Learning
Forum
Moodle
Wiki
2013 IST-Africa Conference and Exhibition, IST-Africa 2013 English 2013 This study aims to investigate the use of Moodle Learning Management System by Sudanese students, particularly forum and wiki collaborative tools. The participants for this study were 92 undergraduate students from University of Khartoum in Sudan, where face to face collaboration is a common indigenous way of learning. The students took part in a Software Engineering blended learning course during the first semester of 2010-2011 academic year. The students' use was assessed using Moodle activity report tool, wiki entries, forum transcripts and students' final examination marks. Pearson product moment correlation coefficient was used to test for the relationship between using forum and wiki tools and the students' performance in the course. A detailed description of the students' use of the tools is provided. The study also showed a moderate correlation between participating in discussion forum and the students' performance in the course, and a low correlation between wiki participation and course performance. 0 0
A support framework for argumentative discussions management in the web Cabrio E.
Villata S.
Fabien Gandon
Lecture Notes in Computer Science English 2013 On the Web, wiki-like platforms allow users to provide arguments in favor or against issues proposed by other users. The increasing content of these platforms as well as the high number of revisions of the content through pros and cons arguments make it difficult for community managers to understand and manage these discussions. In this paper, we propose an automatic framework to support the management of argumentative discussions in wiki-like platforms. Our framework is composed by (i) a natural language module, which automatically detects the arguments in natural language returning the relations among them, and (ii) an argumentation module, which provides the overall view of the argumentative discussion under the form of a directed graph highlighting the accepted arguments. Experiments on the history of Wikipedia show the feasibility of our approach. 0 0
A triangulated investigation of using wiki for project-based learning in different undergraduate disciplines Chu E.H.Y.
Chan C.K.
Michele Notari
Chu S.K.W.
Chen K.
Wu W.W.Y.
Education
Project-based learning
University
Wiki
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 This study investigates the use of wiki to support project-based learning (PBL) in 3 undergraduate courses of different disciplines: English Language Studies, Information Management, and Mechanical Engineering. This study takes a methodological triangulation approach that employs the use of questionnaires, interviews, and wiki activity logs. The level of activities and the types of core actions captured on wiki varied among the three groups of students. Students generally rated positively on the use of wiki to support PBL, while significant differences were found on 9 items (especially in the "Motivation" and "Knowledge Management" dimensions of the questionnaire) among students in the three different disciplines. Interviews revealed that these differences may be attributable to the variations in the natures and scopes of the PBL, as well as in the different emphases that students placed on the work presented on the wiki. This study may provide directions on the use of wiki in PBL in undergraduate courses. Categories and Subject Descriptors K.3.1 [Computers and Education]: Computer Uses in Education - collaborative learning. General Terms Experimentation, Human Factors. Copyright 2010 ACM. 0 0
A virtual player for "who Wants to Be a Millionaire?" based on Question Answering Molino P.
Pierpaolo Basile
Santoro C.
Pasquale Lops
De Gemmis M.
Giovanni Semeraro
Lecture Notes in Computer Science English 2013 This work presents a virtual player for the quiz game "Who Wants to Be a Millionaire?". The virtual player demands linguistic and common sense knowledge and adopts state-of-the-art Natural Language Processing and Question Answering technologies to answer the questions. Wikipedia articles and DBpedia triples are used as knowledge sources and the answers are ranked according to several lexical, syntactic and semantic criteria. Preliminary experiments carried out on the Italian version of the boardgame proves that the virtual player is able to challenge human players. 0 0
A wiki-based assessment system towards social-empowered collaborative learning environment Kao B.C.
Chen Y.H.
Assessment
Co-writing
Collaborative learning
Past exam
Social learning
Social network
Wiki
Lecture Notes in Electrical Engineering English 2013 The social network has been a very popular research area in the recent years. Lot of people at least have one or more social network account and use it keep in touch with other people on the internet and build own small social network. Thus, the effect and the strength of social network is a very deep and worth to figure out the information delivery path and apply to digital learning area. In this age of web 2.0, sharing knowledge is the main stream of the internet activity, everyone on the internet share and exchanges the information and knowledge every day, and starts to collaborate with other users to build specific knowledge domain in the knowledge database website like Wikipedia. This learning behavior also called co-writing or collaborative learning. This learning strategy brings the new way of the future distance learning. But it is hard to evaluate the performance in the co-writing learning activity, researchers still continue to find out more accurate method which can measure and normalize the learner's performance, provide the result to the teacher, assess the student learning performance in social dimension. As our Lab's previous research, there are several technologies proposed in distance learning area. Based on these background generation, we build a wiki-based website, provide past exam question to examinees, help them to collect all of the target college or license exam resource, moreover, examinees can deploy the question on the own social network, discuss with friends, co-resolve the questions and this system will collect the path of these discussions and analyze the information, improve the collaborative learning assessment efficiency research in social learning field. 0 0
Accessible online content creation by end users Kuksenok K.
Brooks M.
Mankoff J.
Accessibility
User generated content
Conference on Human Factors in Computing Systems - Proceedings English 2013 Like most online content, user-generated content (UGC) poses accessibility barriers to users with disabilities. However, the accessibility difficulties pervasive in UGC warrant discussion and analysis distinct from other kinds of online content. Content authors, community culture, and the authoring tool itself all affect UGC accessibility. The choices, resources available, and strategies in use to ensure accessibility are different than for other types of online content. We contribute case studies of two UGC communities with accessible content: Wikipedia, where authors focus on access to visual materials and navigation, and an online health support forum where users moderate the cognitive accessibility of posts. Our data demonstrate real world moderation strategies and illuminate factors affecting success, such as community culture. We conclude with recommended strategies for creating a culture of accessibility around UGC. Copyright 0 0
Adaptive semantics-aware management for web caches and wikis Roque C.
Ferreira P.
Veiga L.
Cache management
Replacement strategies
Web cache
Wiki
Proceedings of the 12th International Workshop on Adaptive and Reflective Middleware, ARM 2013 - Co-located with ACM/IFIP/USENIX 14th International Middleware Conference, Middleware 2013 English 2013 In today's caching and replicated distributed systems, there is a clear need to minimize the amount of data transmitted. This is due to the fact that: i) there is an increase in the size of web objects that can be cached, and the continuous usage increase of these systems makes that a page can be edited and viewed simultaneously by several users. This entails that any modifications to data have to be propagated to a lot of people, thus increasing the use of the network, regardless of the level of interest each one has on such modifications. In this paper, we describe how the current web and wiki systems perform caching and manage replication, and offer an alternative approach by adopting a consistency algorithm, enhanced with user's preferences and notion of inter-document distance, to the web and wiki environments. 0 0
Aemoo: Exploring knowledge on the Web Nuzzolese A.G.
Valentina Presutti
Aldo Gangemi
Alberto Musetti
Paolo Ciancarini
Proceedings of the 3rd Annual ACM Web Science Conference, WebSci 2013 English 2013 Aemoo is a Semantic Web application supporting knowledge exploration on the Web. Through a keyword-based search interface, users can gather an effective summary of the knowledge about an entity, according to Wikipedia, Twitter, and Google News. Summaries are designed by applying lenses based on a set of empirically discovered knowledge patterns. Copyright 2013 ACM. 0 0
An approach for deriving semantically related category hierarchies from Wikipedia category graphs Hejazy K.A.
El-Beltagy S.R.
Category hierarchy
Graph analysis
Hierarchy extraction
Semantic relatedness
Semantic similarity
Wikipedia
Advances in Intelligent Systems and Computing English 2013 Wikipedia is the largest online encyclopedia known to date. Its rich content and semi-structured nature has made it into a very valuable research tool used for classification, information extraction, and semantic annotation, among others. Many applications can benefit from the presence of a topic hierarchy in Wikipedia. However, what Wikipedia currently offers is a category graph built through hierarchical category links the semantics of which are un-defined. Because of this lack of semantics, a sub-category in Wikipedia does not necessarily comply with the concept of a sub-category in a hierarchy. Instead, all it signifies is that there is some sort of relationship between the parent category and its sub-category. As a result, traversing the category links of any given category can often result in surprising results. For example, following the category of "Computing" down its sub-category links, the totally unrelated category of "Theology" appears. In this paper, we introduce a novel algorithm that through measuring the semantic relatedness between any given Wikipedia category and nodes in its sub-graph is capable of extracting a category hierarchy containing only nodes that are relevant to the parent category. The algorithm has been evaluated by comparing its output with a gold standard data set. The experimental setup and results are presented. 0 0
An approach for restructuring text content Aversano L.
Canfora G.
De Ruvo G.
Tortorella M.
Concept Location
Documentation
Reengineering
Refactoring
Reverse Engineering
Wiki
Proceedings - International Conference on Software Engineering English 2013 Software engineers have successfully used Natural Language Processing for refactoring source code. Conversely, in this paper we investigate the possibility to apply software refactoring techniques to textual content. As a procedural program is composed of functions calling each other, a document can be modeled as content fragments connected each other through links. Inspired by software engineering refactoring strategies, we propose an approach for refactoring wiki content. The approach has been applied to the EMF category of Eclipsepedia with encouraging results. 0 0
An approach for using wikipedia to measure the flow of trends across countries Tinati R.
Tiropanis T.
Leslie Carr
Social machines
Web observatories
Web science
Wikipedia
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 Wikipedia has grown to become the most successful online encyclopedia on the Web, containing over 24 million articles, offered in over 240 languages. In just over 10 years Wikipedia has transformed from being just an encyclopedia of knowledge, to a wealth of facts and information, from articles discussing trivia, political issues, geographies and demographics, to popular culture, news articles, and social events. In this paper we explore the use of Wikipedia for identifying the flow of information and trends across the world. We start with the hypothesis that, given that Wikipedia is a resource that is globally available in different languages across countries, access to its articles could be a reflection human activity. To explore this hypothesis we try to establish metrics on the use of Wikipedia in order to identify potential trends and to establish whether or how those trends flow from one county to another. We subsequently compare the outcome of this analysis to that of more established methods that are based on online social media or traditional media. We explore this hypothesis by applying our approach to a subset of Wikipedia articles and also a specific worldwide social phenomenon that occurred during 2012; we investigate whether access to relevant Wikipedia articles correlates to the viral success of the South Korean pop song, "Gangnam Style" and the associated artist "PSY" as evidenced by traditional and online social media. Our analysis demonstrates that Wikipedia can indeed provide a useful measure for detecting social trends and events, and in the case that we studied; it could have been possible to identify the specific trend quicker in comparison to other established trend identification services such as Google Trends. 0 0
An efficient incentive compatible mechanism to motivate wikipedia contributors Pramod M.
Mukhopadhyay S.
Gosh D.
Advances in Intelligent Systems and Computing English 2013 Wikipedia is the world's largest collaboratively edited source of encyclopedic information repository consisting almost 1.5 million articles and more than 90,000 contributors. Although, since its inception on 2001, the numbers of contributors were huge, A study made in 2009 found that members (contributors) may initially contribute to site for pleasure or being motivated by an internal drive to share his knowledge. But latter they are not motivated to edit the related articles so that quality of the articles could be improved [1] [5].In our paper we address above problem in economics perspective. Here we propose a novel scheme to motivate the contributors of Wikipedia with the mechanism design theory that is the most emerging tool at present to address the situation when data is privately held with the agents. 0 0
An empirical study on faculty perceptions and teaching practices of wikipedia Llados J.
Eduard Aibar
Lerga M.
Meseguer A.
Minguillon J.
Faculty perceptions
Online collaborative environments
Open resources
Web 2.0
Wikipedia
Proceedings of the European Conference on e-Learning, ECEL English 2013 Some faculty members from different universities around the world have begun to use Wikipedia as a teaching tool in recent years. These experiences show, in most cases, very satisfactory results and a substantial improvement in various basic skills, as well as a positive influence on the students' motivation. Nevertheless and despite the growing importance of e-learning methodologies based on the use of the Internet for higher education, the use of Wikipedia as a teaching resource remains scarce among university faculty. Our investigation tries to identify which are the main factors that determine acceptance or resistance to that use. We approach the decision to use Wikipedia as a teaching tool by analyzing both the individual attributes of faculty members and the characteristics of the environment where they develop their teaching activity. From a specific survey sent to all faculty of the Universitat Oberta de Catalunya (UOC), pioneer and leader in online education in Spain, we have tried to infer the influence of these internal and external elements. The questionnaire was designed to measure different constructs: perceived quality of Wikipedia, teaching practices involving Wikipedia, use experience, perceived usefulness and use of 2.0 tools. Control items were also included for gathering information on gender, age, teaching experience, academic rank, and area of expertise. Our results reveal that academic rank, teaching experience, age or gender, are not decisive factors in explaining the educational use of Wikipedia. Instead, the decision to use it is closely linked to the perception of Wikipedia's quality, the use of other collaborative learning tools, an active attitude towards web 2.0 applications, and connections with the professional non-academic world. Situational context is also very important, since the use is higher when faculty members have got reference models in their close environment and when they perceive it is positively valued by their colleagues. As far as these attitudes, practices and cultural norms diverge in different scientific disciplines, we have also detected clear differences in the use of Wikipedia among areas of academic expertise. As a consequence, a greater application of Wikipedia both as a teaching resource and as a driver for teaching innovation would require much more active institutional policies and some changes in the dominant academic culture among faculty members. 0 0
An exploration of discussion threads in social news sites: A case study of the Reddit community Weninger T.
Zhu X.A.
Jangwhan Han
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2013 English 2013 Social news and content aggregation Web sites have become massive repositories of valuable knowledge on a diverse range of topics. Millions of Web-users are able to leverage these platforms to submit, view and discuss nearly anything. The users themselves exclusively curate the content with an intricate system of submissions, voting and discussion. Furthermore, the data on social news Web sites is extremely well organized by its user-base, which opens the door for opportunities to leverage this data for other purposes just like Wikipedia data has been used for many other purposes. In this paper we study a popular social news Web site called Reddit. Our investigation looks at the dynamics of its discussion threads, and asks two main questions: (1) to what extent do discussion threads resemble a topical hierarchy? and (2) Can discussion threads be used to enhance Web search? We show interesting results for these questions on a very large snapshot several sub-communities of the Reddit Web site. Finally, we discuss the implications of these results and suggest ways by which social news Web site's can be used to perform other tasks. Copyright 2013 ACM. 0 0
An index for efficient semantic full-text search Holger Bast
Buchhold B.
Indexing
Query processing
Semantic full-text search
International Conference on Information and Knowledge Management, Proceedings English 2013 In this paper we present a novel index data structure tailored towards semantic full-text search. Semantic full-text search, as we call it, deeply integrates keyword-based full-text search with structured search in ontologies. Queries are SPARQL-like, with additional relations for specifying word-entity co-occurrences. In order to build such queries the user needs to be guided. We believe that incremental query construction with context-sensitive suggestions in every step serves that purpose well. Our index has to answer queries and provide such suggestions in real time. We achieve this through a novel kind of posting lists and query processing, avoiding very long (intermediate) result lists and expensive (non-local) operations on these lists. In an evaluation of 8000 queries on the full English Wikipedia (40 GB XML dump) and the YAGO ontology (26.6 million facts), we achieve average query and suggestion times of around 150ms. Copyright is held by the owner/author(s). 0 0
An initial analysis of semantic wikis Gil Y.
Knight A.
Zhang K.
Lei Zhang
Sethi R.
RDF
Semantic web
Semantic wiki
Social knowledge collection
International Conference on Intelligent User Interfaces, Proceedings IUI English 2013 Semantic wikis augment wikis with semantic properties that can be used to aggregate and query data through reasoning. Semantic wikis are used by many communities, for widely varying purposes such as organizing genomic knowledge, coding software, and tracking environmental data. Although wikis have been analyzed extensively, there has been no published analysis of the use of semantic wikis. We carried out an initial analysis of twenty semantic wikis selected for their diverse characteristics and content. Based on the number of property edits per contributor, we identified several patterns to characterize community behaviors that are common to groups of wikis. 0 0
An inter-wiki page data processor for a M2M system Takashi Yamanoue
Kentaro Oda
Koichi Shimozono
API
Java
Sensor Network
Social Network
Wiki
Proceedings - 2nd IIAI International Conference on Advanced Applied Informatics, IIAI-AAI 2013 English 2013 A data processor, which inputs data from wiki pages, processes the data, and outputs the processed data on a wiki page, is proposed. This data processor is designed for a Machine-to-Machine (M2M) system, which uses Arduino, Android, and Wiki software. This processor is controlled by the program which is written on a wiki page. This M2M system consists of mobile terminals and web sites with wiki software. A mobile terminal of the system consists of an Android terminal and it may have an Arduino board with sensors and actuators. The mobile terminal can read data from not only the sensors in the Arduino board but also wiki pages on the Internet. The input data may be processed by the data processor of this paper. The processed data may be sent to a wiki page. The mobile terminal can control the actuators of the Arduino board by reading commands on the wiki page or by running the program of the processor. This system realizes an open communication forum for not only people but also for machines. 0 0
An investigation of the relationship between the amount of extra-textual data and the quality of Wikipedia articles Himoro M.Y.
Hanada R.
Marco Cristo
Pimentel M.D.G.C.
Content quality
Correlations
Extra-textual data
Wikipedia
WebMedia 2013 - Proceedings of the 19th Brazilian Symposium on Multimedia and the Web English 2013 Wikipedia, a web-based collaboratively maintained free encyclopedia, is emerging as one of the most important websites on the internet. However, its openness raises many concerns about the quality of the articles and how to assess it automatically. In the Portuguese-speaking Wikipedia, articles can be rated by bots and by the community. In this paper, we investigate the correlation between these ratings and the count of media items (namely images and sounds) through a series of experiments. Our results show that article ratings and the count of media items are correlated. 0 0
An open conceptual framework for operationalising collective awareness and social sensing Di Maio P.
Ure J.
Ontology
Semantics
Systems engineering
Vocabulary
Wiki
ACM International Conference Proceeding Series English 2013 Substantial EU resources are being invested in research and practice emerging from the socio-technical convergence of networked technologies and social clusters, increasingly referred to as 'collective awareness' and 'social sensing' platforms. Novel concepts and tools are being developed to stimulate and promote technologies and environments, requiring some level of shared conceptualisation of the domain. This position paper identifies the need to capture and represent the knowledge and information in 'social sensing and collective awareness platforms' with minimal formalisms. It proposes steps toward the development of tools for collective development of shared conceptual models, to facilitate communication, knowledge sharing and collaboration in this emerging, and highly interdisciplinary research field. Copyright 0 0
Analysis and forecasting of trending topics in online media streams Althoff T.
Borth D.
Hees J.
Andreas Dengel
Google
Social media analysis. lifecycle forecast
Trending topics
Twitter
Wikipedia
MM 2013 - Proceedings of the 2013 ACM Multimedia Conference English 2013 Among the vast information available on the web, social media streams capture what people currently pay attention to and how they feel about certain topics. Awareness of such trending topics plays a crucial role in multimedia systems such as trend aware recommendation and automatic vocabulary selection for video concept detection systems. Correctly utilizing trending topics requires a better under- standing of their various characteristics in different social media streams. To this end, we present the first comprehensive study across three major online and social media streams, Twitter, Google, and Wikipedia, covering thou- sands of trending topics during an observation period of an entire year. Our results indicate that depending on one's requirements one does not necessarily have to turn to Twitter for information about current events and that some media streams strongly emphasize content of specific categories. As our second key contribution, we further present a novel approach for the challenging task of forecasting the life cycle of trending topics in the very moment they emerge. Our fully automated approach is based on a nearest neighbor forecasting technique exploiting our assumption that semantically similar topics exhibit similar behavior. We demonstrate on a large-scale dataset of Wikipedia page view statistics that forecasts by the proposed approach are about 9-48k views closer to the actual viewing statistics compared to baseline methods and achieve a mean average percentage error of 45-19% for time periods of up to 14 days. Copyright 0 0
Analysis of cluster structure in large-scale English Wikipedia category networks Klaysri T.
Fenner T.
Lachish O.
Mark Levene
Papapetrou P.
Connected component
Graph structure analysis
Large-scale social network analysis
Wikipedia category network
Lecture Notes in Computer Science English 2013 In this paper we propose a framework for analysing the structure of a large-scale social media network, a topic of significant recent interest. Our study is focused on the Wikipedia category network, where nodes correspond to Wikipedia categories and edges connect two nodes if the nodes share at least one common page within the Wikipedia network. Moreover, each edge is given a weight that corresponds to the number of pages shared between the two categories that it connects. We study the structure of category clusters within the three complete English Wikipedia category networks from 2010 to 2012. We observe that category clusters appear in the form of well-connected components that are naturally clustered together. For each dataset we obtain a graph, which we call the t-filtered category graph, by retaining just a single edge linking each pair of categories for which the weight of the edge exceeds some specified threshold t. Our framework exploits this graph structure and identifies connected components within the t-filtered category graph. We studied the large-scale structural properties of the three Wikipedia category networks using the proposed approach. We found that the number of categories, the number of clusters of size two, and the size of the largest cluster within the graph all appear to follow power laws in the threshold t. Furthermore, for each network we found the value of the threshold t for which increasing the threshold to t + 1 caused the "giant" largest cluster to diffuse into two or more smaller clusters of significant size and studied the semantics behind this diffusion. 0 0
Analysis of students' behaviour based on participation and results achieved in wiki-based team assignments Putnik Z.
Budimac Z.
Ivanovic M.
Bothe K.
Attitudes
Moodle
Teamwork
Wiki
ACM International Conference Proceeding Series English 2013 In this paper, we tried to present part of our experiences with the use of Web 2.0, and in particular Wiki technology in education. We are presenting evidence that we had no significant problem in introducing Wikis in scheduling and organizing students work in "assignment solving" part of the course, and that our students embraced and gladly accepted this element of Web 2.0 we added to teaching. Analysis of attitudes and behavior of our students presented in this paper also changed some of our opinions and expectations about students' actions and manners, but we hope that those will only help us in further improvement of our course. Copyright 2013 ACM. 0 0
Analyzing multi-dimensional networks within mediawikis Brian C. Keegan
Ceni A.
Smith M.A.
Data analysis
MediaWiki
Network analysis
Nodexl
Sna
Social media
Visualisation
Wikipedia
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 The MediaWiki platform supports popular socio-technical systems such as Wikipedia as well as thousands of other wikis. This software encodes and records a variety of rela- Tionships about the content, history, and editors of its arti- cles such as hyperlinks between articles, discussions among editors, and editing histories. These relationships can be an- Alyzed using standard techniques from social network analy- sis, however, extracting relational data from Wikipedia has traditionally required specialized knowledge of its API, in- formation retrieval, network analysis, and data visualization that has inhibited scholarly analysis. We present a soft- ware library called the NodeXL MediaWiki Importer that extracts a variety of relationships from the MediaWiki API and integrates with the popular NodeXL network analysis and visualization software. This library allows users to query and extract a variety of multidimensional relationships from any MediaWiki installation with a publicly-accessible API. We present a case study examining the similarities and dif- ferences between dierent relationships for the Wikipedia articles about \Pope Francis" and \Social media." We con- clude by discussing the implications this library has for both theoretical and methodological research as well as commu- nity management and outline future work to expand the capabilities of the library. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous; D.2.8 [Software Engineering]: Metricscomplexity mea- sures, performance measures General Terms System. Copyright 2010 ACM. 0 0
Analyzing task and technology characteristics for enterprise architecture management tool support Hauder M.
Fiedler M.
Florian Matthes
Wust B.
Collaboration
Enterprise architecture management
Enterprise wiki
Tool support
Proceedings - IEEE International Enterprise Distributed Object Computing Workshop, EDOC English 2013 Adequate tool support for Enterprise Architecture (EA) and its respective management function is crucial for the success of the discipline in practice. However, currently available tools used in organizations focus on structured information neglecting the collaborative effort required for developing and planning the EA. As a result, utilization of these tools by stakeholders is often not sufficient and availability of EA products in the organization is limited. We investigate the integration of existing EA tools and Enterprise Wikis to tackle these challenges. We will describe how EA initiatives can benefit from the use and integration of an Enterprise Wiki with an existing EA tool. Main goal of our research is to increase the utilization of EA tools and enhance the availability of EA products by incorporating unstructured information content in the tools. For this purpose we analyze task characteristics that we revealed from the processes and task descriptions of the EA department of a German insurance organization and align them with technology characteristics of EA tools and Enterprise Wikis. We empirically evaluated these technology characteristics using an online survey with results from 105 organizations in previous work. We apply the technology-to-performance chain model to derive the fit between task and technology characteristics for EA management (EAM) tool support in order to evaluate our hypotheses. 0 0
Arabic WordNet semantic relations enrichment through morpho-lexical patterns Boudabous M.M.
Chaaben Kammoun N.
Khedher N.
Belguith L.H.
Sadat F.
Arabic WordNet
Morpho-lexical patterns
NooJ
NooJ grammars
Ontology
Wikipedia
2013 1st International Conference on Communications, Signal Processing and Their Applications, ICCSPA 2013 English 2013 Arabic WordNet (AWN) ontology is one of the most interesting lexical resources for Modern Standard Arabic. Although, its development is based on Princeton WordNet, it suffers from some weaknesses such as the absence of some words and some semantic relations between synsets. In this paper we propose a linguistic method based on morpho-lexical patterns to add semantic relations between synsets in order to improve the AWN performance. This method relies on two steps: morpho-lexical patterns definition and Semantic relations enrichment. We will take advantage of defined patterns to propose a hybrid method for building Arabic ontology based on Wikipedia. 0 0
Arguments about deletion: How experience improves the acceptability of arguments in ad-hoc online task groups Jodi Schneider
Samp K.
Alexandre Passant
Stefan Decker
Argumentation schemes
Collaboration and conflict
Critical questions
Decision-making
Deliberation
Online argumentation
Peer production
Wikipedia
English 2013 Increasingly, ad-hoc online task groups must make decisions about jointly created artifacts such as open source software and Wikipedia articles. Time-consuming and laborious attention to textual discussions is needed to make such decisions, for which computer support would be beneficial. Yet there has been little study of the argumentation patterns that distributed ad-hoc online task groups use in evaluation and decision-making. In a corpus of English Wikipedia deletion discussions, we investigate the argumentation schemes used, the role of the arguer's experience, and which arguments are acceptable to the audience. We report three main results: First, the most prevalent patterns are the Rules and Evidence schemes from Walton's catalog of argumentation schemes [34], which comprise 36% of arguments. Second, we find that familiarity with community norms correlates with the novices' ability to craft persuasive arguments. Third, acceptable arguments use community-appropriate rhetoric that demonstrate knowledge of policies and community values while problematic arguments are based on personal preference and inappropriate analogy to other cases. Copyright 2013 ACM. 0 0
Assessing adoption of wikis in a Singapore secondary school: Using the UTAUT model Toh C.H. Social media
Technology adoption
UTAUT
Wiki
Proceedings of the 2013 IEEE 63rd Annual Conference International Council for Education Media, ICEM 2013 English 2013 This quantitative study explores students' motivation towards the use of wikis to encourage self-directed learning (SDL) and collaborative learning (CoL). SDL and CoL are the goals for Singapore's Ministry of Education Information and Communication Technology Masterplan 3. Wikis were used in the project to support reflection and communication within groups. Five classes consisting of 181 Secondary Two students from a Singapore secondary school were involved in this project. The participants were selected based on their mandatory involvement in an integrated 5-month project initiated by the school. As the participation in the study was voluntary, 144 of the 181 students responded. Sixty nine of the students had no prior experience with wikis. Among the 75 students who had prior experience, most of them used wikis to obtain information while 46 of them shared information using wikis and 51 of them used it to work on collaborative projects with others. The variance explained by Unified Theory of Acceptance and Use of Technology (UTAUT) was 32.4 percent. The results showed that performance expectancy and facilitating condition were found to have a significant relationship with behavioural intention; while effort expectancy and social influence did not, contrary to many prior studies. Modifying the original UTAUT to include three other factors, attitude, trust and comfort level increased the variance explained to 37 percent. However, trust and comfort level were found to have a significant relationship with behavioural intention in the modified UTAUT. This study contributes to UTAUT's theoretical validity and empirical applicability and to the management of technology based initiatives in education. The findings provide insights to educators and schools considering the use of wikis and other forms of social media into their lessons. 0 0
Assessing individual learning and group knowledge in a wiki environment: An empirical analysis Agrifoglio R.
Metallo C.
Varriale L.
Ferrara M.
Casalino N.
De Marco M.
Educational processes
Individual learning
Knowledge sharing
Online collaborative learning
Wiki
IASTED Multiconferences - Proceedings of the IASTED International Conference on Web-Based Education, WBE 2013 English 2013 The aim of this study was to investigate the collaborative learning in an online environment in order to assess the role of technology in determining individual learning of students. It describes the benefits of using a wiki in education and how it can allow students to work together to reach a common goal, giving them a sense of how writing can be effectively performed in collaboration. In collaborative learning with a wiki, students need to agree the structure, the contents, and the methods that are necessary to accomplish cooperative activities. The technology investigated is PBworks Education (PBwiki Edu), a collaborative tool that offers a variety of powerful information sharing and collaboration features in order to improve student's learning activities. Respect than traditional in-class course, PBwiki Edu facilitates the communication and encourages collaborative finding, shaping and sharing of knowledge, all of which are essential properties for student's learning process. A survey methodology was used in undergraduate students of "Management Information Systems" course who used PBwiki Edu for doing four reports concerning to case studies on specific lesson topics. With regard to these topics, we measured individual learning of students before (traditional learning) and after (online learning) any case study and compared these results through t-test method. Findings have shown significant differences between learning before and after case studies, pointing out the contribute of PBwiki Edu to student's learning. 0 0
Assessing quality score of wikipedia articles using mutual evaluation of editors and texts Yu Suzuki
Masatoshi Yoshikawa
Edit history
Peer review
Quality
Vandalism
Wikipedia
International Conference on Information and Knowledge Management, Proceedings English 2013 In this paper, we propose a method for assessing quality scores of Wikipedia articles by mutually evaluating editors and texts. Survival ratio based approach is a major approach to assessing article quality. In this approach, when a text survives beyond multiple edits, the text is assessed as good quality, because poor quality texts have a high probability of being deleted by editors. However, many vandals, low quality editors, delete good quality texts frequently, which improperly decreases the survival ratios of good quality texts. As a result, many good quality texts are unfairly assessed as poor quality. In our method, we consider editor quality score for calculating text quality score, and decrease the impact on text quality by vandals. Using this improvement, the accuracy of the text quality score should be improved. However, an inherent problem with this idea is that the editor quality scores are calculated by the text quality scores. To solve this problem, we mutually calculate the editor and text quality scores until they converge. In this paper, we prove that the text quality score converges. We did our experimental evaluation, and confirmed that our proposed method could accurately assess the text quality scores. Copyright is held by the owner/author(s). 0 0
Assessing trustworthiness in collaborative environments Segall J.
Mayhew M.J.
Atighetchi M.
Greenstadt R.
Collaborative trust
Cyber analytics
Wikipedia
ACM International Conference Proceeding Series English 2013 Collaborative environments, specifically those concerning in- formation creation and exchange, increasingly demand notions of trust and accountability. In the absence of explicit authority, the quality of information is often unknown. Using Wikipedia edit sequences as a use case scenario, we detail experiments in the determination of community-based user and document trust. Our results show success in answering the first of many research questions: Provided a user's edit history, is a given edit to a document positively contributing to its content? We detail how the ability to answer this question provides a preliminary framework towards a better model for collaborative trust and discuss subsequent areas of research necessary to broaden its utility and scope. Copyright 2012 ACM. 0 0
Attributing authorship of revisioned content Luca de Alfaro
Shavlovsky M.
Authorship
Revisioned content
Wikipedia
WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web English 2013 A considerable portion of web content, from wikis to collaboratively edited documents, to code posted online, is revisioned. We consider the problem of attributing authorship to such revisioned content, and we develop scalable attribution algorithms that can be applied to very large bodies of revisioned content, such as the English Wikipedia. Since content can be deleted, only to be later re-inserted, we introduce a notion of authorship that requires comparing each new revision with the entire set of past revisions. For each portion of content in the newest revision, we search the entire history for content matches that are statistically unlikely to occur spontaneously, thus denoting common origin. We use these matches to compute the earliest possible attribution of each word (or each token) of the new content. We show that this \earliest plausible attribution" can be computed efficiently via compact summaries of the past revision history. This leads to an algorithm that runs in time proportional to the sum of the size of the most recent revision, and the total amount of change (edit work) in the revision history. This amount of change is typically much smaller than the total size of all past revisions. The resulting algorithm can scale to very large repositories of revisioned content, as we show via experimental data over the English Wikipedia Copyright is held by the International World Wide Web Conference Committee (IW3C2). 0 0
Automated Decision support for human tasks in a collaborative system: The case of deletion in wikipedia Gelley B.S.
Suel T.
Automating human tasks
Classification
Collaborative system
Decision support
Deletion
Wikipedia
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 Wikipedia's low barriers to participation have the unintended effect of attracting a large number of articles whose topics do not meet Wikipedia's inclusion standards. Many are quickly deleted, often causing their creators to stop contributing to the site. We collect and make available several datasets of deleted articles, heretofore inaccessible, and use them to create a model that can predict with high precision whether or not an article will be deleted. We report precision of 98.6% and recall of 97.5% in the best case and high precision with lower, but still useful, recall, in the most difficult case. We propose to deploy a system utilizing this model on Wikipedia as a set of decision-support tools to help article creators evaluate and improve their articles before posting, and new article patrollers make more informed decisions about which articles to delete and which to improve. Categories and Subject Descriptors H.5.3. Collaborative Computing; Computer Supported Collaborative Work General Terms Measurement, Performance, Human Factors,. Copyright 2010 ACM. 0 0
Automated non-content word list generation using hLDA Krug W.
Tomlinson M.T.
FLAIRS 2013 - Proceedings of the 26th International Florida Artificial Intelligence Research Society Conference English 2013 In this paper, we present a language-independent method for the automatic, unsupervised extraction of non-content words from a corpus of documents. This method permits the creation of word lists that may be used in place of traditional function word lists in various natural language processing tasks. As an example we generated lists of words from a corpus of English, Chinese, and Russian posts extracted from Wikipedia articles and Wikipedia Wikitalk discussion pages. We applied these lists to the task of authorship attribution on this corpus to compare the effectiveness of lists of words extracted with this method to expert-created function word lists and frequent word lists (a common alternative to function word lists). hLDA lists perform comparably to frequent word lists. The trials also show that corpus-derived lists tend to perform better than more generic lists, and both sets of generated lists significantly outperformed the expert lists. Additionally, we evaluated the performance of an English expert list on machine translations of our Chinese and Russian documents, showing that our method also outperforms this alternative. Copyright © 2013, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
Automatic extraction of Polish language errors from text edition history Grundkiewicz R. Error corpora
Language errors detection
Data mining
Lecture Notes in Computer Science English 2013 There are no large error corpora for a number of languages, despite the fact that they have multiple applications in natural language processing. The main reason underlying this situation is a high cost of manual corpora creation. In this paper we present the methods of automatic extraction of various kinds of errors such as spelling, typographical, grammatical, syntactic, semantic, and stylistic ones from text edition histories. By applying of these methods to the Wikipedia's article revision history, we created the large and publicly available corpus of naturally-occurring language errors for Polish, called PlEWi. Finally, we analyse and evaluate the detected error categories in our corpus. 0 0
Automatic summarization of events from social media Chua F.C.T.
Asur S.
Proceedings of the 7th International Conference on Weblogs and Social Media, ICWSM 2013 English 2013 Social media services such as Twitter generate phenomenal volume of content for most real-world events on a daily basis. Digging through the noise and redundancy to understand the important aspects of the content is a very challenging task. We propose a search and summarization framework to extract relevant representative tweets from a time-ordered sample of tweets to generate a coherent and concise summary of an event. We introduce two topic models that take advantage of temporal correlation in the data to extract relevant tweets for summarization. The summarization framework has been evaluated using Twitter data on four real-world events. Evaluations are performed using Wikipedia articles on the events as well as using Amazon Mechanical Turk (MTurk) with human readers (MTurkers). Both experiments show that the proposed models outperform traditional LDA and lead to informative summaries. Copyright © 2013, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 0 0
Automating document annotation using open source knowledge Apoorv Singhal
Kasturi R.
Srivastava J.
Document summarization
Global context
Google Scholar
Wikipedia
Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013 English 2013 Annotating documents with relevant and comprehensive keywords offers invaluable assistance to the readers to quickly overview any document. The problem of document annotation is addressed in the literature under two broad classes of techniques namely, key phrase extraction and key phrase abstraction. In this paper, we propose a novel approach to generate summary phrases for research documents. Given the dynamic nature of scientific research, it has become important to incorporate new and popular scientific terminologies in document annotations. For this purpose, we have used crowd-source knowledge bases like Wikipedia and WikiCFP (a open source information source for call for papers) for automating key phrase generation. Also, we have taken into account the lack of availability of the document's content (due to protective policies) and developed a global context based key-phrase identification approach. We show that given only the title of a document, the proposed approach generates its global context information using academic search engines like Google Scholar. We evaluated the performance of the proposed approach on real-world dataset obtained from a computer science research document corpus. We quantitatively evaluated the performance of the proposed approach and compared it with two baseline approaches. 0 0
Beyond open source software: Framework and implications for open content research Chitu Okoli
Carillo K.D.A.
Creative Commons
Free cultural works
Libre software
Open content
Open knowledge
Open Source Software
Wikipedia
ECIS 2013 - Proceedings of the 21st European Conference on Information Systems English 2013 The same open source philosophy that has been traditionally applied to software development can be applied to the collaborative creation of non-software information products, such as books, music and video. Such products are generically referred to as open content. Due largely to the success of large projects such as Wikipedia and the Creative Commons, open content has gained increasing attention not only in the popular media, but also in scholarly research. It is important to investigate the workings of the open source process in these new media of expression. This paper introduces the scope of emerging research on the open content phenomenon beyond open source software. We develop a framework for categorizing copyrightable works as utilitarian, factual, aesthetic or opinioned works. Based on these categories, we review some key theory-driven findings from open source software research and assess the applicability of extending their implications to open content. We present a research agenda that integrates the findings and proposes a list of research topics that can help lay a solid foundation for open content research. 0 0
BlueFinder: Recommending wikipedia links using DBpedia properties Torres D.
Hala Skaf-Molli
Pascal Molli
Diaz A.
DBpedia
Recommendation
Wikipedia
Proceedings of the 3rd Annual ACM Web Science Conference, WebSci 2013 English 2013 DBpedia knowledge base has been built from data extracted from Wikipedia. However, many existing relations among resources in DBpedia are missing links among articles from Wikipedia. In some cases, adding these links into Wikipedia will enrich Wikipedia content and therefore will enable better navigation. In previous work, we proposed PIA algorithm that predicts the best link to connect two articles in Wikipedia corresponding to those related by a semantic property in DB-pedia and respecting the Wikipedia convention. PIA calculates this link as a path query. After introducing PIA results in Wikipedia, most of them were accepted by the Wikipedia community. However, some were rejected because PIA predicts path queries that are too general. In this paper, we report the BlueFinder collaborative filtering algorithm that fixes PIA miscalculation. It is sensible to the specificity of the resource types. According to the conducted experimentation we found out that BlueFinder is a better solution than PIA because it solves more cases with a better recall. Copyright 2013 ACM. 0 0
Bookmark recommendation in social bookmarking services using Wikipedia Yoshida T.
Inoue U.
Algrorithm
Folksonomy
Recommender
2013 IEEE/ACIS 12th International Conference on Computer and Information Science, ICIS 2013 - Proceedings English 2013 Social bookmarking systems allow users to attach freely chosen keywords as tags to bookmarks of web pages. These tags are used to recommend relevant bookmarks to other users. However, there is no guarantee that every user get enough bookmark recommended, because of the diversity of tags. In this paper, we propose a personalized recommender system using Wikipedia. Our system extends a tag set to find similar users and relevant bookmarks by using the Wikipedia category database. The experimental results show that significant increase of relevant bookmarks recommended without notable increase of the noise. 0 0
Boosting cross-lingual knowledge linking via concept annotation Zhe Wang
Jing-Woei Li
Tang J.
IJCAI International Joint Conference on Artificial Intelligence English 2013 Automatically discovering cross-lingual links (CLs) between wikis can largely enrich the cross-lingual knowledge and facilitate knowledge sharing across different languages. In most existing approaches for cross-lingual knowledge linking, the seed CLs and the inner link structures are two important factors for finding new CLs. When there are insufficient seed CLs and inner links, discovering new CLs becomes a challenging problem. In this paper, we propose an approach that boosts cross-lingual knowledge linking by concept annotation. Given a small number of seed CLs and inner links, our approach first enriches the inner links in wikis by using concept annotation method, and then predicts new CLs with a regression-based learning model. These two steps mutually reinforce each other, and are executed iteratively to find as many CLs as possible. Experimental results on the English and Chinese Wikipedia data show that the concept annotation can effectively improve the quantity and quality of predicted CLs. With 50,000 seed CLs and 30% of the original inner links in Wikipedia, our approach discovered 171,393 more CLs in four runs when using concept annotation. 0 0
Boot-strapping language identifiers for short colloquial postings Goldszmidt M.
Najork M.
Paparizos S.
Language Identification
Twitter
Wikipedia
Lecture Notes in Computer Science English 2013 There is tremendous interest in mining the abundant user generated content on the web. Many analysis techniques are language dependent and rely on accurate language identification as a building block. Even though there is already research on language identification, it focused on very 'clean' editorially managed corpora, on a limited number of languages, and on relatively large-sized documents. These are not the characteristics of the content to be found in say, Twitter or Facebook postings, which are short and riddled with vernacular. In this paper, we propose an automated, unsupervised, scalable solution based on publicly available data. To this end we thoroughly evaluate the use of Wikipedia to build language identifiers for a large number of languages (52) and a large corpus and conduct a large scale study of the best-known algorithms for automated language identification, quantifying how accuracy varies in correlation to document size, language (model) profile size and number of languages tested. Then, we show the value in using Wikipedia to train a language identifier directly applicable to Twitter. Finally, we augment the language models and customize them to Twitter by combining our Wikipedia models with location information from tweets. This method provides massive amount of automatically labeled data that act as a bootstrapping mechanism which we empirically show boosts the accuracy of the models. With this work we provide a guide and a publicly available tool [1] to the mining community for language identification on web and social data. 0 0
Building, maintaining, and using knowledge bases: A report from the trenches Deshpande O.
Lamba D.S.
Tourn M.
Sanmay Das
Subramaniam S.
Rajaraman A.
Harinarayan V.
Doan A.
Data integration
Human curation
Information extraction
Knowledge base
Social media
Taxonomy
Wikipedia
Proceedings of the ACM SIGMOD International Conference on Management of Data English 2013 A knowledge base (KB) contains a set of concepts, instances, and relationships. Over the past decade, numerous KBs have been built, and used to power a growing array of applications. Despite this flurry of activities, however, surprisingly little has been published about the end-to-end process of building, maintaining, and using such KBs in industry. In this paper we describe such a process. In particular, we describe how we build, update, and curate a large KB at Kosmix, a Bay Area startup, and later at WalmartLabs, a development and research lab of Walmart. We discuss how we use this KB to power a range of applications, including query understanding, Deep Web search, in-context advertising, event monitoring in social media, product search, social gifting, and social mining. Finally, we discuss how the KB team is organized, and the lessons learned. Our goal with this paper is to provide a real-world case study, and to contribute to the emerging direction of building, maintaining, and using knowledge bases for data management applications. Copyright 0 0
C Arsan T.
Sen R.
Ersoy B.
Devri K.K.
Lecture Notes in Electrical Engineering English 2013 In this paper, we design and implement a novel all-in-one Media Center that can be directly connected to a high-definition television (HDTV). C# programming is used for developing modular structured media center for home entertainment. Therefore it is possible and easy to add new limitless number of modules and software components. The most importantly, user interface is designed by considering two important factors; simplicity and tidiness. Proposed media center provides opportunities to users to have an experience on listening to music/radio, watching TV, connecting to Internet, online Internet videos, editing videos, Internet connection to pharmacy on duty, checking weather conditions, song lyrics, CD/DVD burning, connecting to Wikipedia. All the modules and design steps are explained in details for user friendly cost effective all-in-one media center. 0 0
COLLEAP - COntextual Language LEArning Pipeline Wloka B.
Werner Winiwarter
Language learning
Natural Language Processing
Web crawling
Lecture Notes in Computer Science English 2013 In this paper we present a concept as well as a prototype of a tool pipeline to utilize the abundant information available on the World Wide Web for contextual, user driven creation and display of language learning material. The approach is to capture Wikipedia articles of the user's choice by crawling, to analyze the linguistic aspects of the text via natural language processing and to compile the gathered information into a visually appealing presentation of enriched language information. The tool is designed to address the Japanese language, with a focus on kanji, the pictographic characters used in Japanese scripture. 0 0
Can a Wiki be used as a knowledge service platform? Lin F.-R.
Wang C.-R.
Huang H.-Y.
HAC+P
Knowledge activity map
Knowledge service
Wikipedia
Advances in Intelligent Systems and Computing English 2013 Many knowledge services have been developed as a matching platform for knowledge demanders and providers. However, most of these knowledge services have a common drawback that they cannot provide a list of experts corresponding to the knowledge demanders' need. Knowledge demanders have to post their questions in a public area and then wait patiently until corresponding knowledge providers appear. In order to facilitate knowledge demanders to acquire knowledge, this study proposes a knowledge service system based on Wikipedia to actively inform potential knowledge providers on behalf of knowledge demanders. This study also developed a knowledge activity map system used for the knowledge service system to identify Wikipedians' knowledge domains. The experimental evaluation results show that the knowledge service system is acceptable by leader users on Wikipedia, in which their domain knowledge can be identified and represented on their knowledge activity maps. 0 0
Characterizing and curating conversation threads: Expansion, focus, volume, re-entry Backstrom L.
Kleinberg J.
Lena Lee
Cristian Danescu-Niculescu-Mizil
Comment threads
Conversations
Facebook
Feed ranking
Likes
On-line communities
Recommendation
Social network
User generated content
Wikipedia
WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining English 2013 Discussion threads form a central part of the experience on many Web sites, including social networking sites such as Facebook and Google Plus and knowledge creation sites such as Wikipedia. To help users manage the challenge of allocating their attention among the discussions that are relevant to them, there has been a growing need for the algorithmic curation of on-line conversations - - the development of automated methods to select a subset of discussions to present to a user. Here we consider two key sub-problems inherent in conversational curation: length prediction - - predicting the number of comments a discussion thread will receive - - and the novel task of re-entry prediction - - predicting whether a user who has participated in a thread will later contribute another comment to it. The first of these sub-problems arises in estimating how interesting a thread is, in the sense of generating a lot of conversation; the second can help determine whether users should be kept notified of the progress of a thread to which they have already contributed. We develop and evaluate a range of approaches for these tasks, based on an analysis of the network structure and arrival pattern among the participants, as well as a novel dichotomy in the structure of long threads. We find that for both tasks, learning-based approaches using these sources of information. 0 0
Chinese text filtering based on domain keywords extracted from Wikipedia Xiaolong Wang
Hua Li
Jia Y.
Jin S.
Text filtering
User profile
Wikipedia
Lecture Notes in Electrical Engineering English 2013 Several machine learning and information retrieval algorithms have been used for text filtering. All these methods have a common ground that they need positive and negative examples to build user profile. However, not all applications can get good training documents. In this paper, we present a Wikipedia based method to build user profile without any other training documents. The proposed method extracts keywords of a special category from Wikipedia taxonomy and computes the weights of the extracted keywords based on Wikipedia pages. Experiment results on Chinese news text dataset SogouC show that the proposed method achieves good performance. 0 0
Clustering editors of wikipedia by editor's biases Nakamura A.
Yu Suzuki
Ishikawa Y.
Bias
Edit histories
Peer reviews
Wikipedia
Proceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013 English 2013 Wikipedia is an Internet encyclopedia where any user can edit articles. Because editors act on their own judgments, editors' biases are reflected in edit actions. When editors' biases are reflected in articles, the articles should have low credibility. However, it is difficult for users to judge which parts in articles have biases. In this paper, we propose a method of clustering editors by editors' biases for the purpose that we distinguish texts' biases by using editors' biases and aid users to judge the credibility of each description. If each text is distinguished such as by colors, users can utilize it for the judgments of the text credibility. Our system makes use of the relationships between editors: agreement and disagreement. We assume that editors leave texts written by editors that they agree with, and delete texts written by editors that they disagree with. In addition, we can consider that editors who agree with each other have similar biases, and editors who disagree with each other have different biases. Hence, the relationships between editors enable to classify editors by biases. In experimental evaluation, we verify that our proposed method is useful in clustering editors by biases. Additionally, we validate that considering the dependency between editors improves the clustering performance. 0 0
Collaborative development of data curation profiles on a wiki platform: Experience from free and open source software projects and communities Sowe S.K.
Koji Zettsu
Cloud computing
Data curation
Data curation profiles
Floss communities
MediaWiki
Open collaboration
Wiki
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 Wiki technologies have proven to be versatile and successful in aiding collaborative authoring of web content. Multitude of users can collaboratively add, edit, and revise wiki pages on the fly, with ease. This functionality makes wikis ideal platforms to support research communities curate data. However, without appropriate customization and a model to support collaborative editing of pages, wikis will fall sort in providing the functionalities needed to support collaborative work. In this paper, we present the architecture and design of a wiki platform, as well as a model that allow scientific communities, especially disaster response scientists, collaborative edit and append data to their wiki pages. Our experience in the implementation of the platform on MediaWiki demonstrates how wiki technologies can be used to support data curation, and how the dynamics of the FLOSS development process, its user and developer communities are increasingly informing our understanding about supporting collaboration and coordination on wikis. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous; D.2.10 [Software Engineering]: Design-methodologies, representation General Terms Design, Human Factors, Management, Theory. Copyright 2010 ACM. 0 0
Collecting interaction traces in distributed semantic wikis Le A.-H.
Lefevre M.
Cordier A.
Hala Skaf-Molli
Distributed semantic wikis
Interaction traces
Model of trace
Trace collection process
Trace-based reasoning
User assistance
ACM International Conference Proceeding Series English 2013 In the Kolow project, our general objective is to develop an assistance engine suitable for distributed applications. In order to provide contextualized and relevant assistance, we feed the assistance engine with interaction traces. Interaction traces record events occurring while users are interacting with applications. These traces become containers of valuable knowledge to providing assistance. Collecting interaction traces is a challenging issue that has been thoroughly studied in the context of local applications. In contrast, few approaches focus on collecting interaction traces in distributed applications. Yet, when applications are distributed, collecting interaction traces is even more challenging because new difficulties arise, such as data synchronization and multi-synchronous collaboration. In this paper, we propose a model and a tool for collecting traces in a distributed environment. The originality of the model is that it is tailored to fit distributed applications. We implemented the model in Collectra, a tool to collect interaction traces in distributed web applications. Collectra collects interaction traces and stores them in a dedicated trace-base management system. We report on the experiments we have conducted in order to evaluate performances of Collectra (both response time and memory space). Results of the experiments show that Collectra performs well and that it can be used to support the assistance tasks carried out by the assistance engine. Copyright 0 0
Collective action towards enhanced knowledge management of neglected and underutilised species: Making use of internet opportunities Hermann M.
Kwek M.J.
Khoo T.K.
Amaya K.
Google Books
Wikimedia Commons
Wikipedia
Acta Horticulturae English 2013 The disproportionate use of crops - with a few species accounting for most of global food production - is being re-enforced by the considerable research, breeding and development efforts that make global crops so competitive vis-à-vis "neglected and underutilised species" (NUS). NUS promotional rhetoric, preaching to the converted, complaints about the discrimination of the "food of the poor" and the loss of traditional dietary habits are unlikely to revert the neglect of the vast majority of crop species. We need to lessen the supply and demand constraints that affect the production and consumption of NUS. NUS attributes relevant to consumers, nutrition and climate change need to be substantiated, demand for NUS stimulated, discriminating agricultural and trade policies amended, and donors convinced to make greater investments in NUS research and development. Much fascinating NUS research and development is underway, but much of this is dissipated amongst countries, institutions and taxa. Researchers operate in unsupportive environments and are often unaware of each other's work. Their efforts remain unrecognised as addressing global concerns. We suggest that the much-needed enhancement of NUS knowledge management should be at the centre of collective efforts of the NUS community. This will underpin future research and development advances as well as inform the formulation and advocacy of policies. This paper recommends that the NUS community make greater use of Internet knowledge repositories to deposit research results, publications and images into the public domain. As examples for such a low-cost approach, we assess the usefulness of Wikipedia, Google Books and Wikimedia Commons for the documentation and dissemination of NUS knowledge. We urge donors and administrators to promote and encourage the use of these and other public and electronically accessible repositories as sources of verification for the achievement of project and research outputs. 0 0
Collective learning paradigm for rapidly evolving curriculum: Facilitating student and content engagement via social media Agarwal N.
Ahmed F.
Classroom learning
Collective learning
Social media
Team learning
Wiki
19th Americas Conference on Information Systems, AMCIS 2013 - Hyperconnected World: Anything, Anywhere, Anytime English 2013 Curriculum in the information systems discipline has been rapidly evolving. This is not only challenging for the instructors to cope with the velocity of change in the curriculum, but also for the students. This paper illustrates a model that leverages the integrated use of social media technologies to facilitate collective learning in a university teaching/learning environment. However, the model could be adapted to other organizational environments. The model demonstrates how various challenges encountered in collective learning can be addressed with the help of social media technologies. A case study is presented to demonstrate the model's applicability, feasibility, utility, and success in a senior-level social computing course at the University of Arkansas at Little Rock. An evolving, non-linear, and self-sustaining wiki portal is developed to encourage engagement between the content, students, and instructor. We further outline the student-centric, content-centric, and learning-centric advantages of the proposed model for the next generation learning environment. 0 0
Combining lexical and semantic features for short text classification Yang L.
Chenliang Li
Ding Q.
Li L.
Feature selection
Short text
Topic model
Wikipedia
Procedia Computer Science English 2013 In this paper, we propose a novel approach to classify short texts by combining both their lexical and semantic features. We present an improved measurement method for lexical feature selection and furthermore obtain the semantic features with the background knowledge repository which covers target category domains. The combination of lexical and semantic features is achieved by mapping words to topics with different weights. In this way, the dimensionality of feature space is reduced to the number of topics. We here use Wikipedia as background knowledge and employ Support Vector Machine (SVM) as classifier. The experiment results show that our approach has better effectiveness compared with existing methods for classifying short texts. 0 0
Communities, artifacts, interaction and contribution on the web Eleni Stroulia Computer-supported collaboration
Social network
Virtual worlds
Web-based collaborative platforms
Wiki
Lecture Notes in Computer Science English 2013 Today, most of us are members of multiple online communities, in the context of which we engage in a multitude of personal and professional activities. These communities are supported by different web-based platforms and enable different types of collaborative interactions. Through our experience with the development of and experimentation with three different such platforms in support of collaborative communities, we recognized a few core research problems relevant across all such tools, and we developed SociQL, a language, and a corresponding software framework, to study them. 0 0
Comparing expert and non-expert conceptualisations of the land: An analysis of crowdsourced land cover data Comber A.
Brunsdon C.
Linda See
Steffen Fritz
Ian McCallum
Geo-Wiki
Geographically Weighted Kernel
Land Cover
Volunteered Geographical Information (VGI)
Lecture Notes in Computer Science English 2013 This research compares expert and non-expert conceptualisations of land cover data collected through a Google Earth web-based interface. In so doing it seeks to determine the impacts of varying landscape conceptualisations held by different groups of VGI contributors on decisions that may be made using crowdsourced data, in this case to select the best global land cover dataset in each location. Whilst much other work has considered the quality of VGI, as yet little research has considered the impact of varying semantics and conceptualisations on the use of VGI in formal scientific analyses. This study found that conceptualisation of cropland varies between experts and non-experts. A number of areas for further research are outlined. 0 0
Complementary information for Wikipedia by comparing multilingual articles Fujiwara Y.
Yu Suzuki
Konishi Y.
Akiyo Nadamoto
Lecture Notes in Computer Science English 2013 Information of many articles is lacking in Wikipedia because users can create and edit the information freely. We specifically examined the multilinguality of Wikipedia and proposed a method to complement information of articles which lack information based on comparing different language articles that have similar contents. However, much non-complementary information is unrelated to a user's browsing article in the results. Herein, we propose improvement of the comparison area based on the classified complementary target. 0 0
Computing semantic relatedness from human navigational paths on wikipedia Singer P.
Niebler T.
Strohmaier M.
Hotho A.
Navigation
Semantic relatedness
Wikipedia
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 This paper presents a novel approach for computing semantic relatedness between concepts on Wikipedia by using human navigational paths for this task. Our results suggest that human navigational paths provide a viable source for calculating semantic relatedness between concepts on Wikipedia. We also show that we can improve accuracy by intelligent selection of path corpora based on path characteristics indicating that not all paths are equally useful. Our work makes an argument for expanding the existing arsenal of data sources for calculating semantic relatedness and to consider the utility of human navigational paths for this task. 0 0
Computing semantic relatedness using word frequency and layout information of wikipedia Chan P.
Hijikata Y.
Nishida S.
ESA
Layout information
Semantic relatedness
Wikipedia article
Word frequency
Proceedings of the ACM Symposium on Applied Computing English 2013 Computing the semantic relatedness between two words or phrases is an important problem for fields such as information retrieval and natural language processing. One state-of-the-art approach to solve the problem is Explicit Semantic Analysis (ESA). ESA uses the word frequency in Wikipedia articles to estimate the relevance, so the relevance of words with low frequency cannot always be well estimated. To improve the relevance estimate of the low frequency words, we use not only word frequency but also layout information in Wikipedia articles. Empirical evaluation shows that on the low frequency words, our method achieves better estimate of semantic relatedness over ESA. Copyright 2013 ACM. 0 0
Constructing a focused taxonomy from a document collection Olena Medelyan
Manion S.
Broekstra J.
Divoli A.
Huang A.-L.
Witten I.H.
Lecture Notes in Computer Science English 2013 We describe a new method for constructing custom taxonomies from document collections. It involves identifying relevant concepts and entities in text; linking them to knowledge sources like Wikipedia, DBpedia, Freebase, and any supplied taxonomies from related domains; disambiguating conflicting concept mappings; and selecting semantic relations that best group them hierarchically. An RDF model supports interoperability of these steps, and also provides a flexible way of including existing NLP tools and further knowledge sources. From 2000 news articles we construct a custom taxonomy with 10,000 concepts and 12,700 relations, similar in structure to manually created counterparts. Evaluation by 15 human judges shows the precision to be 89% and 90% for concepts and relations respectively; recall was 75% with respect to a manually generated taxonomy for the same domain. 0 0
Construction of a Japanese gazetteers for Japanese local toponym disambiguation Yoshioka M.
Fujiwara T.
Proceedings of the 7th Workshop on Geographic Information Retrieval, GIR 2013 English 2013 When processing toponym information in natural language text, it is crucial to have a good gazetteers. There are several well-organized gazetteers for English text, but they do not cover Japanese local toponyms. In this paper, we introduce a Japanese gazetteers based on Open Data (e.g., the Toponym database distributed by Japanese ministries, Wikipedia, and GeoNames) and propose a toponym disambiguation framework that uses the constructed gazetteers. We also evaluate our approach based on a blog corpus that contains place names with high ambiguity. 0 0
Contributor profiles, their dynamics, and their importance in five Q&A sites Furtado A.
Andrade N.
Oliveira N.
Brasileiro F.
Datamining and machine learning
Empirical methods
Q&A sites
Quantitative
Studies of wikipedia/web
English 2013 Q&A sites currently enable large numbers of contributors to collectively build valuable knowledge bases. Naturally, these sites are the product of contributors acting in different ways - creating questions, answers or comments and voting in these -, contributing in diverse amounts, and creating content of varying quality. This paper advances present knowledge about Q&A sites using a multifaceted view of contributors that accounts for diversity of behavior, motivation and expertise to characterize their profiles in five sites. This characterization resulted in the definition of ten behavioral profiles that group users according to the quality and quantity of their contributions. Using these profiles, we find that the five sites have remarkably similar distributions of contributor profiles. We also conduct a longitudinal study of contributor profiles in one of the sites, identifying common profile transitions, and finding that although users change profiles with some frequency, the site composition is mostly stable over time. Copyright 2013 ACM. 0 0
Could someone please translate this? - Activity analysis of wikipedia article translation by non-experts Ari Hautasaari Activity analysis
Non-experts
Translation
Wikipedia
English 2013 Wikipedia translation activities aim to improve the quality of the multilingual Wikipedia through article translation. We performed an activity analysis of the translation work done by individual English to Chinese non-expert translators, who translated linguistically complex Wikipedia articles in a laboratory setting. From the analysis, which was based on Activity Theory, and which examined both information search and translation activities, we derived three translation strategies that were used to inform the design of a support system for human translation activities in Wikipedia. Copyright 2013 ACM. 0 0
Crawling deep web entity pages He Y.
Xin D.
Ganti V.
Rajaraman S.
Shah N.
Deep-web crawl
Entities
Web data
WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining English 2013 Deep-web crawl is concerned with the problem of surfacing hidden content behind search interfaces on the Web. While many deep-web sites maintain document-oriented textual content (e.g., Wikipedia, PubMed, Twitter, etc.), which has traditionally been the focus of the deep-web literature, we observe that a significant portion of deep-web sites, including almost all online shopping sites, curate structured entities as opposed to text documents. Although crawling such entity-oriented content is clearly useful for a variety of purposes, existing crawling techniques optimized for document oriented content are not best suited for entity-oriented sites. In this work, we describe a prototype system we have built that specializes in crawling entity-oriented deep-web sites. We propose techniques tailored to tackle important subproblems including query generation, empty page filtering and URL deduplication in the specific context of entity oriented deep-web sites. These techniques are experimentally evaluated and shown to be effective. 0 0
Cross language prediction of vandalism on wikipedia using article views and revisions Tran K.-N.
Christen P.
Lecture Notes in Computer Science English 2013 Vandalism is a major issue on Wikipedia, accounting for about 2% (350,000+) of edits in the first 5 months of 2012. The majority of vandalism are caused by humans, who can leave traces of their malicious behaviour through access and edit logs. We propose detecting vandalism using a range of classifiers in a monolingual setting, and evaluated their performance when using them across languages on two data sets: the relatively unexplored hourly count of views of each Wikipedia article, and the commonly used edit history of articles. Within the same language (English and German), these classifiers achieve up to 87% precision, 87% recall, and F1-score of 87%. Applying these classifiers across languages achieve similarly high results of up to 83% precision, recall, and F1-score. These results show characteristic vandal traits can be learned from view and edit patterns, and models built in one language can be applied to other languages. 0 0
Cross lingual entity linking with bilingual topic model Zhang T.
Kang Liu
Jun Zhao
IJCAI International Joint Conference on Artificial Intelligence English 2013 Cross lingual entity linking means linking an entity mention in a background source document in one language with the corresponding real world entity in a knowledge base written in the other language. The key problem is to measure the similarity score between the context of the entity mention and the document of the cand idate entity. This paper presents a general framework for doing cross lingual entity linking by leveraging a large scale and bilingual knowledge base, Wikipedia. We introduce a bilingual topic model that mining bilingual topic from this knowledge base with the assumption that the same Wikipedia concept documents of two different languages share the same semantic topic distribution. The extracted topics have two types of representation, with each type corresponding to one language. Thus both the context of the entity mention and the document of the cand idate entity can be represented in a space using the same semantic topics. We use these topics to do cross lingual entity linking. Experimental results show that the proposed approach can obtain the competitive results compared with the state-of-art approach. 0 0
Cross-media topic mining on wikipedia Xiaolong Wang
Yuanyuan Liu
Dingquan Wang
Fei Wu
Cross media
Sparsity
Topic modeling
Wikipedia
MM 2013 - Proceedings of the 2013 ACM Multimedia Conference English 2013 As a collaborative wiki-based encyclopedia, Wikipedia pro- vides a huge amount of articles of various categories. In addition to their text corpus, Wikipedia also contains plenty of images which makes the articles more intuitive for readers to understand. To better organize these visual and textual data, one promising area of research is to jointly model the embedding topics across multi-modal data (i.e, cross-media) from Wikipedia. In this work, we propose to learn the projection matrices that map the data from heterogeneous feature spaces into a unified latent topic space. Different from previous approaches, by imposing the ℓ1 regularizers to the projection matrices, only a small number of relevant visual/textual words are associated with each topic, which makes our model more interpretable and robust. Further- more, the correlations of Wikipedia data in different modalities are explicitly considered in our model. The effectiveness of the proposed topic extraction algorithm is verified by several experiments conducted on real Wikipedia datasets. Copyright 0 0
DFT-extractor: A system to extract domain-specific faceted taxonomies from wikipedia Wei B.
Liu J.
Jun Ma
Zheng Q.
Weinan Zhang
Feng B.
Faceted taxonomy
Network motif
Wikipedia
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 Extracting faceted taxonomies from the Web has received increasing attention in recent years from the web mining community. We demonstrate in this study a novel system called DFT-Extractor, which automatically constructs domain-specific faceted taxonomies from Wikipedia in three steps: 1) It crawls domain terms from Wikipedia by using a modified topical crawler. 2) Then it exploits a classification model to extract hyponym relations with the use of motif-based features. 3) Finally, it constructs a faceted taxonomy by applying a community detection algorithm and a group of heuristic rules. DFT-Extractor also provides a graphical user interface to visualize the learned hyponym relations and the tree structure of taxonomies. 0 0
Design and implementation of wiki content transformations and refactorings Hannes Dohrn
Dirk Riehle
Refactoring
Sweble
Transformation
Wiki
Wiki markup
Wiki object model
WM
WOM
XML
XSLT
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 The organic growth of wikis requires constant attention by contributors who are willing to patrol the wiki and improve its content structure. However, most wikis still only oer textual editing and even wikis which oer WYSIWYG editing do not assist the user in restructuring the wiki. Therefore, "gardening" a wiki is a tedious and error-prone task. One of the main obstacles to assisted restructuring of wikis is the underlying content model which prohibits automatic transformations of the content. Most wikis use either a purely textual representation of content or rely on the representational HTML format. To allow rigorous definitions of transformations we use and extend a Wiki Object Model. With theWiki Object Model installed we present a catalog of transformations and refactorings that helps users to easily and consistently evolve the content and structure of a wiki. Furthermore we propose XSLT as language for transformation specification and provide working examples of selected transformations to demonstrate that theWiki Object Model and the transformation framework are well designed. We believe that our contribution significantly simplifies wiki "gardening" by introducing the means of eortless restructuring of articles and groups of articles. It furthermore provides an easily extensible foundation for wiki content transformations. Categories and Subject Descriptors H.4 [Information Systems]: Information Systems Applications; I.7 [Computing Methodologies]: Document and Text Processing; D.2 [Software]: Software Engineering General Terms Design, Languages. Copyright 2010 ACM. 0 0
Designing a chat-bot that simulates an historical figure Haller E.
Rebedea T.
Chat-bot
Conversational Agent
Information extraction
Parsing
Wikipedia
Proceedings - 19th International Conference on Control Systems and Computer Science, CSCS 2013 English 2013 There are many applications that are incorporating a human appearance and intending to simulate human dialog, but in most of the cases the knowledge of the conversational bot is stored in a database created by a human experts. However, very few researches have investigated the idea of creating a chat-bot with an artificial character and personality starting from web pages or plain text about a certain person. This paper describes an approach to the idea of identifying the most important facts in texts describing the life (including the personality) of an historical figure for building a conversational agent that could be used in middle-school CSCL scenarios. 0 0
Detecting collaboration from behavior Bauer T.
Garcia D.
Colbaugh R.
Glass K.
IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics English 2013 This paper describes a method for inferring when a person might be coordinating with others based on their behavior. We show that, in Wikipedia, editing behavior is more random when coordinating with others. We analyzed this using both entropy and conditional entropy. These algorithms rely only on timestamped events associated with entities, making them broadly applicable to other domains. In this paper, we will discuss previous research on this topic, how we adapted that research to the problem ofWikipedia edit behavior, describe how we extended it, and discuss our results. 0 0
Detecting controversy on the web Dori-Hacohen S.
Allan J.
Controversy detection
Critical literacy
Sentiment analysis
International Conference on Information and Knowledge Management, Proceedings English 2013 A useful feature to facilitate critical literacy would alert users when they are reading a controversial web page. This requires solving a binary classification problem: does a given web page discuss a controversial topic? We explore the feasibility of solving the problem by treating it as supervised k-nearest-neighbor classification. Our approach (1) maps a webpage to a set of neighboring Wikipedia articles which were labeled on a controversiality metric; (2) coalesces those labels into an estimate of the webpage's controversiality; and finally (3) converts the estimate to a binary value using a threshold. We demonstrate the applicability of our approach by validating it on a set of webpages drawn from seed queries. We show absolute gains of 22% in F 0.5 on our test set over a sentiment-based approach, highlighting that detecting controversy is more complex than simply detecting opinions. Copyright is held by the owner/author(s). 0 0
Detection of article qualities in the chinese wikipedia based on c4.5 decision tree Xiao K.
Li B.
He P.
Yang X.-H.
Application of supervised learning
Article quality
Data ming
Decision tree
Wikipedia
Lecture Notes in Computer Science English 2013 The number of articles in Wikipedia is growing rapidly. It is important for Wikipedia to provide users with high quality and reliable articles. However, the quality assessment metric provided by Wikipedia are inefficient, and other mainstream quality detection methods only focus on the qualities of the English Wikipedia articles, and usually analyze the text contents of articles, which is also a time-consuming process. In this paper, we propose a method for detecting the article qualities of the Chinese Wikipedia based on C4.5 decision tree. The problem of quality detection is transformed to classification problem of high-quality and low-quality articles. By using the fields from the tables in the Chinese Wikipedia database, we built the decision trees to distinguish high-quality articles from low-quality ones. 0 0
Determining leadership in contentious discussions Jain S.
Hovy E.
Contentious discussion
Discussion leader discovery
Discussion participant role
Natural Language Processing
Social multimedia
Wikipedia
Electronic Proceedings of the 2013 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2013 English 2013 Participants in online decision making environments assume different roles. Especially in contentious discussions, the outcome often depends critically on the discussion leader(s). Recent work on automated leadership analysis has focused on collaborations where all the participants have the same goal. In this paper we focus on contentious discussions, in which the participants have different goals based on their opinion, which makes the notion of leader very different. We analyze discussions on the Wikipedia Articles for Deletion (AfD) forum. We define two complimentary models, Content Leader and SilentOut Leader. The models quantify the basic leadership qualities of participants and assign leadership points to them. We compare the correlation between the leaders' rank produced by the two models using the Spearman Coefficient. We also propose a method to verify the quality of the leaders identified by each model. 0 0
Determining relation semantics by mapping relation phrases to knowledge base Liu F.
Yuanyuan Liu
Guangyou Zhou
Kang Liu
Jun Zhao
Open Information Extraction
Relation Mapping
Wikipedia Infobox
Proceedings - 2nd IAPR Asian Conference on Pattern Recognition, ACPR 2013 English 2013 0 0
Digital histories for the digital age: Collaborative writing in large lecture courses Soh L.-K.
Nobel Khandaker
Thomas W.G.
Collaboration
Digital History
Digital Humanities
History
Multiagent
Wiki
Writing
Proceedings of the International Conference e-Learning 2013 English 2013 The digital environment has had an immense effect on American society, learning, and education: we have more sources available at our fingertips than any previous generation. Teaching and learning with these new sources, however, has been a challenging transition. Students are confronted with an ocean of digital objects and need skills to navigate the World Wide Web and numerous proprietary databases. Writing and disciplinary habits of mind are more important than ever in this environment, so how do we teach these in the digital age? This paper examines the current digital environment that humanities faculty face in their teaching and explores new tools that might support collaborative writing and digital skills development for students. In particular, this paper considers the effectiveness of a specially configured multi-agent wiki system for writing in a large lecture humanities course and explores the results of its deployment over two years. 0 0
Digital services in immersive urban virtual environments Meira C.
Freitas J.
Barbosa L.
Melo M.
Bessa M.
Magalhaes L.
Digital Services
Immersive Virtual Environments
Wiki
Iberian Conference on Information Systems and Technologies, CISTI Portuguese 2013 Virtual Environments (VE) systems may provide a new way to deliver information and services in many areas, for example in tourism, urban planning and education. In urban VE there is a close link between the virtual environment and the urban environment that are intended to represent. These VE can be an intuitive way to access a set of services with a direct association to the real object or entity to which they are related. In this article, we describe a case study that aimed at exploring the possibility of using new interfaces to exploit and use services in urban VE with a greater sense of immersiveness. The results indicate that the VE interfaces are a natural and intuitive access to digital services. While users have felt a greater difficulty in performing some of the tasks in immersive scenario, the majority considered that this scenario provided a greater sense of immersion and realism. 0 0
Disambiguation to Wikipedia: A language and domain independent approach Nguyen T.-V.T. Lecture Notes in Computer Science English 2013 Disambiguation to Wikipedia (D2W) is the task of linking mentions of concepts in text to their corresponding Wikipedia articles. Traditional approaches to D2W has focused either in only one language (e.g. English) or in formal texts (e.g. news articles). In this paper, we present a multilingual framework with a set of new features that can be obtained purely from the online encyclopedia, without the need of any natural language specific tool. We analyze these features with different languages and different domains. The approach shows as fully language-independent and has been applied successfully to English, Italian, Polish, with a consistent improvement. We show that only a sufficient number of Wikipedia articles is needed for training. When trained on real-world data sets for English, our new features yield substantial improvement compared to current local and global disambiguation algorithms. Finally, the adaption to the Bridgeman query logs in digital libraries shows the robustness of our approach even in the lack of disambiguation context. Also, as no natural language specific tool is needed, the method can be applied to other languages in a similar manner with little adaptation. 0 0
Discovering details and scene structure with hierarchical iconoid shift Weyand T.
Leibe B.
Hierarchical clustering
Image clustering
Medoid shift
Scale space
Semantic labelling
Proceedings of the IEEE International Conference on Computer Vision English 2013 Current landmark recognition engines are typically aimed at recognizing building-scale landmarks, but miss interesting details like portals, statues or windows. This is because they use a flat clustering that summarizes all photos of a building facade in one cluster. We propose Hierarchical Iconoid Shift, a novel landmark clustering algorithm capable of discovering such details. Instead of just a collection of clusters, the output of HIS is a set of dendrograms describing the detail hierarchy of a landmark. HIS is based on the novel Hierarchical Medoid Shift clustering algorithm that performs a continuous mode search over the complete scale space. HMS is completely parameter-free, has the same complexity as Medoid Shift and is easy to parallelize. We evaluate HIS on 800k images of 34 landmarks and show that it can extract an often surprising amount of detail and structure that can be applied, e.g., to provide a mobile user with more detailed information on a landmark or even to extend the landmark's Wikipedia article. 0 0
Discovering missing semantic relations between entities in Wikipedia Xu M.
Zhe Wang
Bie R.
Jing-Woei Li
Zheng C.
Ke W.
Zhou M.
Infobox
Linked data
Wikipedia
Lecture Notes in Computer Science English 2013 Wikipedia's infoboxes contain rich structured information of various entities, which have been explored by the DBpedia project to generate large scale Linked Data sets. Among all the infobox attributes, those attributes having hyperlinks in its values identify semantic relations between entities, which are important for creating RDF links between DBpedia's instances. However, quite a few hyperlinks have not been anotated by editors in infoboxes, which causes lots of relations between entities being missing in Wikipedia. In this paper, we propose an approach for automatically discovering the missing entity links in Wikipedia's infoboxes, so that the missing semantic relations between entities can be established. Our approach first identifies entity mentions in the given infoboxes, and then computes several features to estimate the possibilities that a given attribute value might link to a candidate entity. A learning model is used to obtain the weights of different features, and predict the destination entity for each attribute value. We evaluated our approach on the English Wikipedia data, the experimental results show that our approach can effectively find the missing relations between entities, and it significantly outperforms the baseline methods in terms of both precision and recall. 0 0
… further results

See also[edit]