Germany

From WikiPapers
Jump to: navigation, search
Countries

Argentina

Australia

Austria

Belgium

Brazil

Canada

China

Denmark

Egypt

France

Germany

Hungary

India

Israel

Italy

Japan

Macau

Netherlands

Poland

Portugal

Spain

Switzerland

United States

This page compiles all the information regarding Germany.

Events

This is a list of events celebrated in this country.
Name Type DateThis property is a special property in this wiki. Website
Wikipedia CPOV Conference 2010 Leipzig conference 24 September 2010 http://www.cpov.de
Wikimania 2005 conference 4 August 2005 http://wikimania2005.wikimedia.org

Authors

This is a list of authors in this country.
Name Affiliation Website
Andre Köhler
Anja Haake
Anna Samoilenko GESIS – Leibniz Institute for Social Sciences http://annsamoilenko.wix.com/homepage
Christian Pentzold
Denny Vrandečić
Elena Simperl
Fabian Flöck GESIS – Leibniz Institute for Social Sciences
Frank Fuchs-Kittowski Fraunhofer ISST
Johanna Niesyto http://transnationalspaces.wordpress.com/
Maik Anderka University of Paderborn http://maik.anderka.com
Martin Potthast http://www.uni-weimar.de/cms/medien/webis/people/martin-potthast.html
Oliver Ferschke http://www.ukp.tu-darmstadt.de/people/doctoral-researchers/oliver-ferschke
Stephan Lukosch
Sven Heimbuch University of Duisburg-Essen https://www.uni-due.de/psychmeth/heimbuch.php
Thomas Tunsch National Museums in Berlin
Prussian Cultural Heritage Foundation
http://about.me/thtbln
Till Schümmer

Publications

This is a list of publications by authors of this country.
Title Author(s) Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Linguistic neighbourhoods: explaining cultural borders on Wikipedia through multilingual co-editing activity Anna Samoilenko
Fariba Karimi
Daniel Edler
Jérôme Kunegis
Markus Strohmaier
Wikipedia multilingual cultural similarity network digital language divide socio-linguistics digital humanities hypothesis testing EPJ Data Science English 11 March 2016 In this paper, we study the network of global interconnections between language communities, based on shared co-editing interests of Wikipedia editors, and show that although English is discussed as a potential lingua franca of the digital space, its domination disappears in the network of co-editing similarities, and instead local connections come to the forefront. Out of the hypotheses we explored, bilingualism, linguistic similarity of languages, and shared religion provide the best explanations for the similarity of interests between cultural communities. Population attraction and geographical proximity are also significant, but much weaker factors bringing communities together. In addition, we present an approach that allows for extracting significant cultural borders from editing activity of Wikipedia users, and comparing a set of hypotheses about the social mechanisms generating these borders. Our study sheds light on how culture is reflected in the collective process of archiving knowledge on Wikipedia, and demonstrates that cross-lingual interconnections on Wikipedia are not dominated by one powerful language. Our findings also raise some important policy questions for the Wikimedia Foundation. 0 0
WikiWho: Precise and Efficient Attribution of Authorship of Revisioned Content Fabian Flöck
Maribel Acosta
Wikipedia
Version control
Content modeling
Community- driven content creation
Collaborative authoring
Online collaboration
Authorship
World Wide Web Conference 2014 English 2014 Revisioned text content is present in numerous collaboration platforms on the Web, most notably Wikis. To track authorship of text tokens in such systems has many potential applications; the identification of main authors for licensing reasons or tracing collaborative writing patterns over time, to name some. In this context, two main challenges arise. First, it is critical for such an authorship tracking system to be precise in its attributions, to be reliable for further processing. Second, it has to run efficiently even on very large datasets, such as Wikipedia. As a solution, we propose a graph-based model to represent revisioned content and an algorithm over this model that tackles both issues effectively. We describe the optimal implementation and design choices when tuning it to a Wiki environment. We further present a gold standard of 240 tokens from English Wikipedia articles annotated with their origin. This gold standard was created manually and confirmed by multiple independent users of a crowdsourcing platform. It is the first gold standard of this kind and quality and our solution achieves an average of 95% precision on this data set. We also perform a first-ever precision evaluation of the state-of-the-art algorithm for the task, exceeding it by over 10% on average. Our approach outperforms the execution time of the state-of-the-art by one order of magnitude, as we demonstrate on a sample of over 240 English Wikipedia articles. We argue that the increased size of an optional materialization of our results by about 10% compared to the baseline is a favorable trade-off, given the large advantage in runtime performance. 0 0
Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia Maik Anderka Information quality
Wikipedia
Quality Flaws
Quality Flaw Prediction
Bauhaus-Universität Weimar, Germany English 2013 Web applications that are based on user-generated content are often criticized for containing low-quality information; a popular example is the online encyclopedia Wikipedia. The major points of criticism pertain to the accuracy, neutrality, and reliability of information. The identification of low-quality information is an important task since for a huge number of people around the world it has become a habit to first visit Wikipedia in case of an information need. Existing research on quality assessment in Wikipedia either investigates only small samples of articles, or else deals with the classification of content into high-quality or low-quality. This thesis goes further, it targets the investigation of quality flaws, thus providing specific indications of the respects in which low-quality content needs improvement. The original contributions of this thesis, which relate to the fields of user-generated content analysis, data mining, and machine learning, can be summarized as follows:

(1) We propose the investigation of quality flaws in Wikipedia based on user-defined cleanup tags. Cleanup tags are commonly used in the Wikipedia community to tag content that has some shortcomings. Our approach is based on the hypothesis that each cleanup tag defines a particular quality flaw.

(2) We provide the first comprehensive breakdown of Wikipedia's quality flaw structure. We present a flaw organization schema, and we conduct an extensive exploratory data analysis which reveals (a) the flaws that actually exist, (b) the distribution of flaws in Wikipedia, and, (c) the extent of flawed content.

(3) We present the first breakdown of Wikipedia's quality flaw evolution. We consider the entire history of the English Wikipedia from 2001 to 2012, which comprises more than 508 million page revisions, summing up to 7.9 TB. Our analysis reveals (a) how the incidence and the extent of flaws have evolved, and, (b) how the handling and the perception of flaws have changed over time.

(4) We are the first who operationalize an algorithmic prediction of quality flaws in Wikipedia. We cast quality flaw prediction as a one-class classification problem, develop a tailored quality flaw model, and employ a dedicated one-class machine learning approach. A comprehensive evaluation based on human-labeled Wikipedia articles underlines the practical applicability of our approach.
0 0
Identifying, understanding and detecting recurring, harmful behavior patterns in collaborative wikipedia editing - Doctoral proposal Flock F.
Elena Simperl
Rettinger A.
Collaboration systems
Collective intelligence
Editing behavior
Social dynamics
User modeling
Web science
Wikipedia
WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web English 2013 In this doctoral proposal, we describe an approach to identify recurring, collective behavioral mechanisms in the collaborative interactions of Wikipedia editors that have the potential to undermine the ideals of quality, neutrality and completeness of article content. We outline how we plan to parametrize these patterns in order to understand their emergence and evolution and measure their effective impact on content production in Wikipedia. On top of these results we intend to build end-user tools to increase the transparency of the evolution of articles and equip editors with more elaborated quality monitors. We also sketch out our evaluation plans and report on already accomplished tasks. 0 0
Informationswissenschaftliche Herausforderungen für kulturelle Gedächtnisorganisationen Thomas Tunsch Museum
Museumsdokumentation
Web 2.0
Semantic web
Informationswissenschaft
Data model
Datenmodell
Modellierung
CIDOC CRM
Standards
Cultural heritage
Kulturerbe
Collaborative community
Vernetzte Arbeitsgemeinschaft
EVA 2012 Berlin German 7 November 2012 At the beginning of digitization museums tried to achieve comparable data by means of strict rules. The orientation to the structures and concepts of different disciplines could thereby result in limitations of interdisciplinary collaboration.

New communication structures in the World Wide Web, the Web 2.0, and the progress in the standardization of information on cultural heritage are the foundation for the development of semantic data models and the interdisciplinary communication. In this connection the intensive cooperation between experts of different disciplines by means of information science is necessary. It is also essential to agree on common terms, which can often be achieved only through continuous communication reflecting the specific knowledge in different fields of expertise.

Semantic data models are becoming more and more important for museums, especially in the collection and use of extrinsic information about museum objects, because only the reliable representation of the existing specialized information and their general accessibility enables the scholarly debate on a high level.
0 0
Reverts Revisited: Accurate Revert Detection in Wikipedia Fabian Flöck
Denny Vrandečić
Elena Simperl
Wikipedia
Revert detection
Editing behavior
User modeling
Collaboration systems
Community-driven content creation
Social dynamics
Hypertext and Social Media 2012 English June 2012 Wikipedia is commonly used as a proving ground for research in collaborative systems. This is likely due to its popularity and scale, but also to the fact that large amounts of data about its formation and evolution are freely available to inform and validate theories and models of online collaboration. As part of the development of such approaches, revert detection is often performed as an important pre-processing step in tasks as diverse as the extraction of implicit networks of editors, the analysis of edit or editor features and the removal of noise when analyzing the emergence of the con-tent of an article. The current state of the art in revert detection is based on a rather naïve approach, which identifies revision duplicates based on MD5 hash values. This is an efficient, but not very precise technique that forms the basis for the majority of research based on revert relations in Wikipedia. In this paper we prove that this method has a number of important drawbacks - it only detects a limited number of reverts, while simultaneously misclassifying too many edits as reverts, and not distinguishing between complete and partial reverts. This is very likely to hamper the accurate interpretation of the findings of revert-related research. We introduce an improved algorithm for the detection of reverts based on word tokens added or deleted to adresses these drawbacks. We report on the results of a user study and other tests demonstrating the considerable gains in accuracy and coverage by our method, and argue for a positive trade-off, in certain research scenarios, between these improvements and our algorithm’s increased runtime. 13 0
A Breakdown of Quality Flaws in Wikipedia Maik Anderka
Benno Stein
Quality Flaws
Information quality
Wikipedia
User-generated Content Analysis
2nd Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality 12) English 2012 The online encyclopedia Wikipedia is a successful example of the increasing popularity of user generated content on the Web. Despite its success, Wikipedia is often criticized for containing low-quality information, which is mainly attributed to its core policy of being open for editing by everyone. The identification of low-quality information is an important task since Wikipedia has become the primary source of knowledge for a huge number of people around the world. Previous research on quality assessment in Wikipedia either investigates only small samples of articles, or else focuses on single quality aspects, like accuracy or formality. This paper targets the investigation of quality flaws, and presents the first complete breakdown of Wikipedia's quality flaw structure. We conduct an extensive exploratory analysis, which reveals (1) the quality flaws that actually exist, (2) the distribution of flaws in Wikipedia, and (3) the extent of flawed content. An important finding is that more than one in four English Wikipedia articles contains at least one quality flaw, 70% of which concern article verifiability. 0 0
Behind the Article: Recognizing Dialog Acts in Wikipedia Talk Pages Oliver Ferschke
Iryna Gurevych
Yevgen Chebotar
Wikipedia
Talk Pages
Discourse Analysis
Work Coordination
Information quality
Collaboration
Proceedings of the 13th Conference of the European Chapter of the ACL (EACL 2012) 2012 In this paper, we propose an annotation schema for the discourse analysis of Wikipedia Talk pages aimed at the coordination efforts for article improvement. We apply the annotation schema to a corpus of 100 Talk pages from the Simple English Wikipedia and make the resulting dataset freely available for download1 . Furthermore, we perform automatic dialog act classification on Wikipedia discussions and achieve an average F1 -score of 0.82 with our classification pipeline. 0 0
FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia Oliver Ferschke
Iryna Gurevych
Marc Rittberger
PAN English 2012 With over 23 million articles in 285 languages, Wikipedia is the largest free knowledge base on the web. Due to its open nature, everybody is allowed to access and edit the contents of this huge encyclopedia. As a downside of this open access policy, quality assessment of the content becomes a critical issue and is hardly manageable without computational assistance. In this paper, we present FlawFinder, a modular system for automatically predicting quality flaws in unseen Wikipedia articles. It competed in the inaugural edition of the Quality Flaw Prediction Task at the PAN Challenge 2012 and achieved the best precision of all systems and the second place in terms of recall and F1-score. 0 1
On the Evolution of Quality Flaws and the Effectiveness of Cleanup Tags in the English Wikipedia Maik Anderka
Benno Stein
Matthias Busse
Wikipedia
Cleanup Tags
Quality Flaws
Information quality
Quality Flaw Evolution
Wikipedia Academy English 2012 The improvement of information quality is a major task for the free online encyclopedia Wikipedia. Recent studies targeted the analysis and detection of specific quality flaws in Wikipedia articles. To date, quality flaws have been exclusively investigated in current Wikipedia articles, based on a snapshot representing the state of Wikipedia at a certain time. This paper goes further, and provides the first comprehensive breakdown of the evolution of quality flaws in Wikipedia. We utilize cleanup tags to analyze the quality flaws that have been tagged by the Wikipedia community in the English Wikipedia, from its launch in 2001 until 2011. This leads to interesting findings regarding (1) the development of Wikipedia's quality flaw structure and (1) the usage and the effectiveness of cleanup tags. Specifically, we show that inline tags are more effective than tag boxes, and provide statistics about the considerable volume of rare and non-specific cleanup tags. We expect that this work will support the Wikipedia community in making quality assurance activities more efficient. 0 0
Overview of the 1st International Competition on Quality Flaw Prediction in Wikipedia Maik Anderka
Benno Stein
Information quality
Wikipedia
Quality Flaw Prediction
CLEF English 2012 The paper overviews the task "Quality Flaw Prediction in Wikipedia" of the PAN'12 competition. An evaluation corpus is introduced which comprises 1,592,226 English Wikipedia articles, of which 208,228 have been tagged to contain one of ten important quality flaws. Moreover, the performance of three quality flaw classifiers is evaluated. 0 0
Predicting Quality Flaws in User-generated Content: The Case of Wikipedia Maik Anderka
Benno Stein
Nedim Lipka
User-generated Content Analysis
Information quality
Wikipedia
Quality Flaw Prediction
One-class Classification
35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) English 2012 The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. We apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. We present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. We argue that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem. We develop a quality flaw model and employ a dedicated machine learning approach to predict Wikipedia's most important quality flaws. Since in the Wikipedia setting the acquisition of significant test data is intricate, we analyze the effects of a biased sample selection. In this regard we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. The flaw prediction performance is evaluated with 10,000 Wikipedia articles that have been tagged with the ten most frequent quality flaws: provided test data with little noise, four flaws can be detected with a precision close to 1. 0 0
Predicting quality flaws in user-generated content: The case of wikipedia Maik Anderka
Benno Stein
Nedim Lipka
Information quality
One-class classification
Quality flaw prediction
User-generated content analysis
Wikipedia
SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval English 2012 The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. We apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. We present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. We argue that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem. We develop a quality flaw model and employ a dedicated machine learning approach to predict Wikipedia's most important quality flaws. Since in the Wikipedia setting the acquisition of significant test data is intricate, we analyze the effects of a biased sample selection. In this regard we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. The flaw prediction performance is evaluated with 10,000 Wikipedia articles that have been tagged with the ten most frequent quality flaws: provided test data with little noise, four flaws can be detected with a precision close to 1. 0 0
Reverts revisited - Accurate revert detection in wikipedia Flock F.
Vrandecic D.
Elena Simperl
Collaboration systems
Community-driven content creation
Editing behavior
Revert detection
Social dynamics
User modeling
Wikipedia
HT'12 - Proceedings of 23rd ACM Conference on Hypertext and Social Media English 2012 Wikipedia is commonly used as a proving ground for research in collaborative systems. This is likely due to its popularity and scale, but also to the fact that large amounts of data about its formation and evolution are freely available to inform and validate theories and models of online collaboration. As part of the development of such approaches, revert detection is often performed as an important pre-processing step in tasks as diverse as the extraction of implicit networks of editors, the analysis of edit or editor features and the removal of noise when analyzing the emergence of the content of an article. The current state of the art in revert detection is based on a rather naïve approach, which identifies revision duplicates based on MD5 hash values. This is an efficient, but not very precise technique that forms the basis for the majority of research based on revert relations in Wikipedia. In this paper we prove that this method has a number of important drawbacks - it only detects a limited number of reverts, while simultaneously misclassifying too many edits as reverts, and not distinguishing between complete and partial reverts. This is very likely to hamper the accurate interpretation of the findings of revert-related research. We introduce an improved algorithm for the detection of reverts based on word tokens added or deleted to adresses these drawbacks. We report on the results of a user study and other tests demonstrating the considerable gains in accuracy and coverage by our method, and argue for a positive trade-off, in certain research scenarios, between these improvements and our algorithm's increased runtime. Copyright 2012 ACM. 0 0
Wikidata: a new platform for collaborative data collection Denny Vrandečić Semantic web
Wikipedia
Linked data
DBpedia
International conference companion on World Wide Web English 2012 This year, Wikimedia starts to build a new platform for the collaborative acquisition and maintenance of structured data: Wikidata. Wikidata's prime purpose is to be used within the other Wikimedia projects, like Wikipedia, to provide well-maintained, high-quality data. The nature and requirements of the Wikimedia projects require to develop a few novel, or at least unusual features for Wikidata: Wikidata will be a secondary database, i.e. instead of containing facts it will contain references for facts. It will be fully internationalized. It will contain inconsistent and contradictory facts, in order to represent the diversity of knowledge about a given entity. 0 0
Kommunikation für Experten: Kulturelle Gedächtnisorganisationen und vernetzte Arbeitsgemeinschaften Thomas Tunsch Web 2.0
Kulturerbe
Archiv
Bibliothek
Museum
WikiMedia
Wikipedia
MuseumsWiki
Collaborative
Community
EVA 2011 Berlin German 9 November 2011 Being a part of contemporary culture collaborative communities gain more and more importance for cultural memory organizations as well. Through this it becomes evident that these organizations not only serve as storage or to guarantee conservation but also shape cultural history and its perception. At the same time collaborative communities are using cultural memory organizations as sources and for reference.

Cultural memory organizations are shaped by experts from various disciplines in their structure and effectiveness significantly. Therefore collaborative communities are becoming more important for experts and their communication network.

Collaborative communities are partially employing new ways and methods to organize knowledge, which are often less known in cultural memory organizations and are therefore rejected or considered transitory trends. However both cultural memory organizations and collaborative communities rely on the acceptance of society and need their results to be trusted by the members of society.
0 0
Towards a diversity-minded Wikipedia Fabian Flöck
Denny Vrandečić
Elena Simperl
Wikipedia
Diversity
Community-driven content creation
Social dynamics
Opinion mining
Sentiment analysis
WebSci Conference English June 2011 Wikipedia is a top-ten Web site providing a free encyclopedia created by an open community of volunteer contributors. As investigated in various studies over the past years, contributors have different backgrounds, mindsets and biases; however, the effects - positive and negative - of this diversity on the quality of the Wikipedia content, and on the sustainability of the overall project are yet only partially understood. In this paper we discuss these effects through an analysis of existing scholarly literature in the area and identify directions for future research and development; we also present an approach for diversity-minded content management within Wikipedia that combines techniques from semantic technologies, data and text mining and quantitative social dynamics analysis to create greater awareness of diversity-related issues within theWikipedia community, give readers access to indicators and metrics to understand biases and their impact on the quality of Wikipedia articles, and support editors in achieving balanced versions of these articles that leverage the wealth of knowledge and perspectives inherent to large-scale collaboration. 24 1
Critical Point of View: A Wikipedia Reader Amila Akdag Salah
Nicholas Carr
Shun-ling Chen
Florian Cramer
Morgan Currie
Edgar Enyedy
Andrew Famiglietti
Heather Ford
Mayo Fuster Morell
Cheng Gao
R. Stuart Geiger
Mark Graham
Gautam John
Dror Kamir
Peter B. Kaufman
Scott Kildall
Lawrence Liang
Patrick Lichty
Geert Lovink
Hans Varghese Mathews
Johanna Niesyto
Matheiu O’Neil
Dan O’Sullivan
Joseph M. Reagle
Andrea Scharnhorst
Alan Shapiro
Christian Stegbauer
Nathaniel Stern
Krzystztof Suchecki
Nathaniel Tkacz
Maja van der Velden
Institute of Network Cultures English 2011 For millions of internet users around the globe, the search for new knowledge begins with Wikipedia. The encyclopedia’s rapid rise, novel organization, and freely offered content have been marveled at and denounced by a host of commentators. Critical Point of View moves beyond unflagging praise, well-worn facts, and questions about its reliability and accuracy, to unveil the complex, messy, and controversial realities of a distributed knowledge platform. 0 4
Detection of Text Quality Flaws as a One-class Classification Problem Maik Anderka
Benno Stein
Nedim Lipka
Information quality
Wikipedia
Quality Flaw Prediction
One-class Classification
20th ACM Conference on Information and Knowledge Management (CIKM 11) English 2011 For Web applications that are based on user generated content the detection of text quality flaws is a key concern. Our research contributes to automatic quality flaw detection. In particular, we propose to cast the detection of text quality flaws as a one-class classification problem: we are given only positive examples (= texts containing a particular quality flaw) and decide whether or not an unseen text suffers from this flaw. We argue that common binary or multiclass classification approaches are ineffective in here, and we underpin our approach by a real-world application: we employ a dedicated one-class learning approach to determine whether a given Wikipedia article suffers from certain quality flaws. Since in the Wikipedia setting the acquisition of sensible test data is quite intricate, we analyze the effects of a biased sample selection. In addition, we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. Altogether, provided test data with little noise, four from ten important quality flaws in Wikipedia can be detected with a precision close to 1. 0 0
Enterprise Wikis: Technical challenges and opportunities Kampgen B.
Basil Ell
Elena Simperl
Vrandecic D.
Frank Dengler
Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI) English 2011 Social software has proven valuable in enterprises for collaborative knowledge management. In order to introduce a wiki in the enterprise, we propose a solution that combinesWeb 2.0 and SemanticWeb technologies. We describe how this solution resolves the technical challenges, beyond that, opens up new opportunities, and, also, how it can be realized in a concrete enterprise scenario. 0 0
Imagining the Wikipedia community: What do Wikipedia authors mean when they write about their 'community'? Christian Pentzold Computer-mediated communication
Grounded Theory
Online community
Wikipedia
New Media and Society English 2011 This article examines the way Wikipedia authors write their 'community' into being. Mobilizing concepts regarding the communicative constitution of communities, the computer-mediated conversation between editors were investigated using Grounded Theory procedures. The analysis yielded an empirically grounded theory of the users' self-understanding of the Wikipedia community as ethos-action community. Hence, this study contributes to research on online community-building as it shifts the focus from structural criteria for communities to the discursive level of community formation. 0 0
Imagining the Wikipedia community: what do Wikipedia authors mean when they write about their ˜community? Christian Pentzold New media & society XX(X) 1–18 2011 This article examines the way Wikipedia authors write their ˜community into being. Mobilizing concepts regarding the communicative constitution of communities, the computer-mediated conversation between editors were investigated using Grounded Theory procedures. The analysis yielded an empirically grounded theory of the users self-understanding of the Wikipedia community as ethos-action community. Hence, this study contributes to research on online community-building as it shifts the focus from structural criteria for communities to the discursive level of community formation. 0 0
Query segmentation revisited Hagen M.
Martin Potthast
Benno Stein
Brautigam C.
Corpus
Query segmentation
Web N-grams
Proceedings of the 20th International Conference on World Wide Web, WWW 2011 English 2011 We address the problem of query segmentation: given a keyword query, the task is to group the keywords into phrases, if possible. Previous approaches to the problem achieve reasonable segmentation performance but are tested only against a small corpus of manually segmented queries. In addition, many of the previous approaches are fairly intricate as they use expensive features and are difficult to be reimplemented. The main contribution of this paper is a new method for query segmentation that is easy to implement, fast, and that comes with a segmentation accuracy comparable to current state-of-the-art techniques. Our method uses only raw web n-gram frequencies and Wikipedia titles that are stored in a hash table. At the same time, we introduce a new evaluation corpus for query segmentation. With about 50 000 human-annotated queries, it is two orders of magnitude larger than the corpus being used up to now. Copyright © 2011 by the Association for Computing Machinery, Inc. (ACM). 0 0
Towards automatic quality assurance in Wikipedia Maik Anderka
Benno Stein
Nedim Lipka
Wikipedia
Information quality
Flaw Detection
20th International Conference on World Wide Web (WWW 11) English 2011 Featured articles in Wikipedia stand for high information quality, and it has been found interesting to researchers to analyze whether and how they can be distinguished from "ordinary" articles. Here we point out that article discrimination falls far short of writer support or automatic quality assurance: Featured articles are not identified, but are made. Following this motto we compile a comprehensive list of information quality flaws in Wikipedia, model them according to the latest state of the art, and devise one-class classification technology for their identification. 0 0
Wiki-Based Maturing of Process Descriptions Business Process Management Frank Dengler
Denny Vrandečić
English 2011 Traditional process elicitation methods are expensive and time consuming. Recently, a trend toward collaborative, user-centric, on-line business process modeling can be observed. Current social software approaches, satisfying such a collaborative modeling, mostly focus on the graphical development of processes and do not consider existing textual process description like HowTos or guidelines. We address this issue by combining graphical process modeling techniques with a wiki-based light-weight knowledge capturing approach and a background semantic knowledge base. Our approach enables the collaborative maturing of process descriptions with a graphical representation, formal semantic annotations, and natural language. Existing textual process descriptions can be translated into graphical descriptions and formal semantic annotations. Thus, the textual and graphical process descriptions are made explicit and can be further processed. As a result, we provide a holistic approach for collaborative process development that is designed to foster knowledge reuse and maturing within the system. 0 0
Wikiing pro: semantic wiki-based process editor Frank Dengler
Denny Vrandečić
Elena Simperl
English 2011 Recently, a trend toward collaborative, user-centric, on-line process modeling can be observed. Unfortunately, current social software approaches mostly focus on the graphical development of processes and do not consider existing textual process description like HowTos or guidelines. We address this issue by combining graphical process modeling techniques with a wiki-based light-weight knowledge capturing approach and a background semantic knowledge base. Our approach enables the collaborative maturing of process descriptions with a graphical representation, formal semantic annotations, and natural language. By translating existing textual process descriptions into graphical descriptions and formal semantic annotations, we provide a holistic approach for collaborative process development that is designed to foster knowledge reuse and maturing within the system. 0 0
Wikipedia revision toolkit: Efficiently accessing Wikipedia's edit history Oliver Ferschke
Torsten Zesch
Iryna Gurevych
ACL HLT 2011 - 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of Student Session English 2011 We present an open-source toolkit which allows (i) to reconstruct past states of Wikipedia, and (ii) to efficiently access the edit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia. Beyond that, the edit history of Wikipedia articles has been shown to be a valuable knowledge source for NLP, but access is severely impeded by the lack of efficient tools for managing the huge amount of provided data. By using a dedicated storage format, our toolkit massively decreases the data volume to less than 2% of the original size, and at the same time provides an easy-to-use interface to access the revision data. The language-independent design allows to process any language represented in Wikipedia. We expect this work to consolidate NLP research using Wikipedia in general, and to foster research making use of the knowledge encoded in Wikipedia's edit history. 0 0
Wikipedia revision toolkit: efficiently accessing Wikipedia's edit history Oliver Ferschke
Torsten Zesch
Iryna Gurevych
HLT English 2011 0 0
Cross-language plagiarism detection Martin Potthast
Barrón-CedeñAlberto o
Benno Stein
Paolo Rosso
Language Resources and Evaluation 2010 0 0
Crowdsourcing a Wikipedia Vandalism Corpus Martin Potthast Wikipedia
Vandalism detection
Evaluation
Corpus
SIGIR English 2010 We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazon’s Mechanical Turk. The corpus compiles 32 452 edits on 28 468 Wikipedia articles, among which 2 391 vandalism edits have been identified. 753 human annotators cast a total of 193 022 votes on the edits, so that each edit was reviewed by at least 3 annotators, whereas the achieved level of agreement was analyzed in order to label an edit as “regular” or “vandalism.” The corpus is available free of charge. 6 1
Evaluating cross-language explicit semantic analysis and cross querying Maik Anderka
Nedim Lipka
Benno Stein
Lecture Notes in Computer Science English 2010 This paper describes our participation in the TEL@CLEF task of the CLEF 2009 ad-hoc track. The task is to retrieve items from various multilingual collections of library catalog records, which are relevant to a user's query. Two different strategies are employed: (i) the Cross-Language Explicit Semantic Analysis, CL-ESA, where the library catalog records and the queries are represented in a multilingual concept space that is spanned by aligned Wikipedia articles, and, (ii) a Cross Querying approach, where a query is translated into all target languages using Google Translate and where the obtained rankings are combined. The evaluation shows that both strategies outperform the monolingual baseline and achieve comparable results. Furthermore, inspired by the Generalized Vector Space Model we present a formal definition and an alternative interpretation of the CL-ESA model. This interpretation is interesting for real-world retrieval applications since it reveals how the computational effort for CL-ESA can be shifted from the query phase to a preprocessing phase. 0 0
Overcoming information overload in the enterprise: The active approach Elena Simperl
Thurlow I.
Paul Warren
Frank Dengler
Davies J.
Marko Grobelnik
Mladenic D.
Gomez-Perez J.M.
Moreno C.R.
Context mining
Information overload
Internet
Knowledge management
Knowledge process
Knowledge worker
Productivity
Semantic wiki
IEEE Internet Computing English 2010 Knowledge workers are central to an organization's success, yet their information management tools often hamper their productivity. This has major implications for businesses across the globe because their commercial advantage relies on the optimal exploitation of their own enterprise information, the huge volumes of online information, and the productivity of the required knowledge work. The Active project addresses this challenge through an integrated knowledge management workspace that reduces information overload by significantly improving the mechanisms for creating, managing, and using information. The project's approach follows three themes: sharing information through tagging, wikis, and ontologies; prioritizing information delivery by understanding users' current-task context; and leveraging informal processes that are learned from user behavior. 0 0
Fixing the floating gap: The online encyclopaedia Wikipedia as a global memory place Christian Pentzold Collective memory
Consensus and contestation
Discourse
World Wide Web
Memory Studies English May 2009 The article proposes to interpret the web-based encyclopaedia Wikipedia as a global memory place. After presenting the core elements and basic characteristics of wikis and Wikipedia respectively, the article discusses four related issues of social memory studies: collective memory, communicative and cultural memory, `memory places' and the `floating gap'. In a third step, these theoretical premises are connected to the understanding of discourse as social cognition. Fourth, comparison is made between the potential of the World Wide Web as cyberspace for collective remembrance and the obstacles that stand in its way. On this basis, the article argues that Wikipedia presents a global memory place where memorable elements are negotiated. Its complex processes of discussion and article creation are a model of the discursive fabrication of memory. Thus, they can be viewed and analysed as the transition, the `floating gap' between communicative and collective frames of memory. 6 3
The ESA Retrieval Model Revisited Maik Anderka
Benno Stein
32th International ACM SIGIR Conference (SIGIR 09) English 2009 Among the retrieval models that have been proposed in the last years, the ESA model of Gabrilovich and Markovitch received much attention. The authors report on a significant improvement in the retrieval performance, which is explained with the semantic concepts introduced by the document collection underlying ESA. Their explanation appears plausible but our analysis shows that the connections are more involved and that the "concept hypothesis" does not hold. In our contribution we analyze several properties that in fact affect the retrieval performance. Moreover, we introduce a formalization of ESA, which reveals its close connection to existing retrieval models. 0 0
Die Schöne und das Tier: Semantic Web und Wikis Thomas Tunsch EVA 2008 Berlin German 12 November 2008 Although to a large extent the Semantic Web specifies fundamentals and future potentials of the WWW, it is associated with current projects as well. With standards like the CIDOC Conceptual Reference Model for the domain of cultural heritage, main principles for the semantic network are available already today.

In contrast, Wikis seem to exemplify the danger of subjectivity and absence of verification for many experts in museums, especially due to general participation beyond traditional areas of expertise.

Semantic MediaWiki is an effective tool for converting seeming contradictions into a prolific challenge for forward-looking international collaboration.
0 0
Museen und Wikis: Vorteile vernetzter Arbeitsgemeinschaften Thomas Tunsch MAI-Tagung ("museums and the internet") German 26 May 2008 Obwohl Wikipedia mittlerweile große Aufmerksamkeit in den Medien genießt und als Phänomen des Web 2.0 zunehmend auch Gegenstand wissenschaftlicher Untersuchungen ist, sind vernetzte Arbeitsgemeinschaften in der Welt der Museen noch weitgehend unbekannt. Während in Universitäten und Bibliotheken Wikis teilweise schon zum Bestandteil der internen und externen Kommunikation geworden sind, scheint die Idee des „schnellen Wissens“ in den Museen bisher nur wenig Widerhall zu finden. Für die Nutzung von Wikis in Museen bieten sich verschiedene Möglichkeiten an, deren Vorteile und Grenzen beschrieben werden sollen. 0 0
A Wikipedia-Based Multilingual Retrieval Model Martin Potthast
Benno Stein
Maik Anderka
30th European Conference on IR Research (ECIR 08) English 2008 This paper introduces CL-ESA, a new multilingual retrieval model for the analysis of cross-language similarity. The retrieval model exploits the multilingual alignment of Wikipedia: given a document d written in language L we construct a concept vector d for d, where each dimension i in d quantifies the similarity of d with respect to a document chosen from the “L-subset” of Wikipedia. Likewise, for a second document d′ written in language L′, , we construct a concept vector d′, using from the L′-subset of the Wikipedia the topic-aligned counterparts of our previously chosen documents. Since the two concept vectors d and d′ are collection-relative representations of d and d′ they are language-independent. I. e., their similarity can directly be computed with the cosine similarity measure, for instance. We present results of an extensive analysis that demonstrates the power of this new retrieval model: for a query document d the topically most similar documents from a corpus in another language are properly ranked. Salient property of the new retrieval model is its robustness with respect to both the size and the quality of the index document collection. 0 0
Museen und Wikipedia Thomas Tunsch Wikipedia
Dokumentation
Zusammenarbeit
Gemeinschaft
Wissenschaft
Verknüpfungen
EVA 2007 Berlin German 7 November 2007 In order to define the possible advantages of utilization and cooperation, both the museum world and the Wikipedia world can be considered communities dedicated to the expansion of knowledge. Museums collect objects, provide documentation and produce knowledge about those objects and the representing fields of sciences or other scholarship. Wikipedia collects data and information pieces, provides articles, and at the same time offers insight into the process of how knowledge grows. Especially the following areas demonstrate important connections:

methods (discussion, conventions, manuals, standards) practical experience (authors, stable knowledge/process) content (metadata, SWD, PND, templates, structure, quality management, languages) contributors and users (museum staff, visitors, public)

As a possible alternative or extension of using Wikipedia the project “MuseumsWiki” shall be demonstrated.
1 0
Museum Documentation and Wikipedia.de: Possibilities, opportunities and advantages for scholars and museums Thomas Tunsch Wikipedia
Documentation
Collaborative
Community
Scholars
Interconnections
J. Trant and D. Bearman (eds). Museums and the Web 2007: Proceedings. Toronto: Archives & Museum Informatics English 31 March 2007 The importance of Wikipedia for the documentation and promotion of museum holdings is gaining acceptance, and the number of references to articles is growing. However, the museum world still pays little attention to the Wikipedia project as a collaborative community with intentions, structures, and special features. Although these observations are based on museums in Germany and focus on the German Wikipedia, they are just as important and applicable to other museums and other editions of Wikipedia. Universities and libraries have already taken advantage of the Wikipedia and have established functional links. In that the mission of museums is closely related to that of universities and libraries, the value of Wikipedia for museum professionals is worthy of consideration. This paper provides the complete study to serve as reference for the selected topics to be discussed in the professional forum. 0 0
Methoden zur sprachübergreifenden Plagiaterkennung Maik Anderka University of Paderborn German 2007 0 0
Semantic Wikipedia Markus Krötzsch
Denny Vrandečić
Max Völkel
Heiko Haller
Rudi Studer
Web Semantics 2007 Wikipedia is the world's largest collaboratively edited source of encyclopaedic knowledge. But in spite of its utility, its content is barely machine-interpretable and only weakly structured. With Semantic {MediaWiki} we provide an extension that enables wiki-users to semantically annotate wiki pages, based on which the wiki contents can be browsed, searched, and reused in novel ways. In this paper, we give an extended overview of Semantic {MediaWiki} and discuss experiences regarding performance and current applications. 2007 Elsevier {B.V.} All rights reserved. 0 2
Wikipedia in the pocket: Indexing technology for near-duplicate detection and high similarity search Martin Potthast Fuzzy-fingerprinting
Hash-based indexing
Near-duplicate detection
Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 English 2007 We develop and implement a new indexing technology which allows us to use complete (and possibly very large) documents as queries, while having a retrieval performance comparable to a standard term query. Our approach aims at retrieval tasks such as near duplicate detection and high similarity search. To demonstrate the performance of our technology we have compiled the search index "Wikipedia in the Pocket", which contains about 2 million English and German Wikipedia articles.1 This index - along with a search interface - fits on a conventional CD (0.7 gigabyte). The ingredients of our indexing technology are similarity hashing and minimal perfect hashing. 0 0
Wikipedia in the pocket: indexing technology for near-duplicate detection and high similarity search Martin Potthast English 2007 0 0
Foucault@Wiki: first steps towards a conceptual framework for the analysis of Wiki discourses Christian Pentzold
Sebastian Seidenglanz
Wiki
Wikipedia
Computer-mediated communication
Online collaboration
Foucault
Discourse theory
WikiSym English 2006 In this paper, we examine the discursive situation of Wikipedia. The primary goal is to explore principle ways of analyzing and characterizing the various forms of communicative user interaction using Foucault"s discourse theory. First, the communicative situation of Wikipedia is addressed and a list of possible forms of communication is compiled. Second, the current research on the linguistic features of Wikis, especially Wikipedia, is reviewed. Third, some key issues of Foucault"s theory are explored: the notion of "discourse", the discursive formation, and the methods of archaeology and genealogy, respectively. Finally, first steps towards a qualitative discourse analysis of the English Wikipedia are elaborated. The paper argues, that Wikipedia can be understood as a discursive formation that regulates and structures the production of statements. Most of the discursive regularities named by Foucault are established in the collaborative writing processes of Wikipedia, too. Moreover, the editing processes can be described in Foucault"s terms as discursive knowledge production. 12 1
Semantic MediaWiki (ISWC 2006) Markus Krötzsch
Denny Vrandečić
Max Völkel
ISWC English 2006 Semantic MediaWiki is an extension of MediaWiki – a widely used wiki-engine that also powers Wikipedia. Its aim is to make semantic technologies available to a broad community by smoothly integrating them with the established usage of MediaWiki. The software is already used on a number of productive installations world-wide, but the main target remains to establish “Semantic Wikipedia” as an early adopter of semantic technologies on the web. Thus usability and scalability are as important as powerful semantic features. 0 0
Integration of communities into process-oriented structures Kohler A.
Frank Fuchs-Kittowski
Cooperative knowledge generation
Knowledge community
Knowledge-intensive processes
Process-oriented knowledge structures
Wiki
Journal of Universal Computer Science English 2005 This article aims at the integration of communities of practice into work processes. Linear structures are often inappropriate for the execution of knowledge intensive tasks and work processes. The latter are characterized by non-linear sequences and dynamic, social interaction. But for the work in communities of practice the leading path, that is needed for structuring the work, is often missing. Our article exposes the requirements in order to integrate the dynamic, social processes of the knowledge generation in communities of practice with formal described knowledge intensive processes. For the support of communities the Wiki-concept is introduced. In order to integrate communities into process structures a concept for an appropriate interface is presented. On the basis of this interface concept an information retrieval algorithm is used to connect the process-oriented structures with community-oriented structures. The prototype realisation of this concept is shown by a short example. © J.UCS. 0 0
Wiki Communities in the Context of Work Processes Frank Fuchs-Kittowski
Andre Köhler
Ontology
Wiki
Community
Cooperative knowledge generation
Knowledge work
Work processes
Knowledge process
Process-oriented knowledge structures
WikiSym English 2005 In this article we examine the integration of communities of practice supported by a wiki into work processes. Linear structures are often inappropriate for the execution of knowledge-intensive tasks and work processes. The latter are characterized by non-linear sequences and dynamic social interaction. Communities of practice, however, often lack the „guiding light” needed to structure their work. We discuss the primary requirements for the integration of formally described knowledge-intensive processes into the dynamic social processes of knowledge generation in communities of practice and use the wiki approach for their support. We present our approach for an appropriate interface to integrate wiki communities into process structures and an information retrieval algorithm based on it to connect the process-oriented structures with community-oriented wiki structures. We show the prototypical realization of the concept by a brief example. 0 1
Wiki Templates - Adding Structure Support to Wikis on Demand Anja Haake
Stephan Lukosch
Till Schümmer
Wiki
Template
Tailoring
Structural editing and viewing
WikiSym English 2005 This paper introduces the concept of wiki templates that allows end-users to determine the structure and appearance of a wiki page. In particular, this better supports editing of structured wiki pages. Wiki templates may be adapted (defined and redefined) by end-users. They may be applied if found helpful, but need not to be used, thus maintaining the simple wiki editing way. In addition, we introduce a methodology to reuse wiki templates among different wiki instances. We show how wiki templates have been successfully used in real-world applications in our CURE wiki engine. 1 0
Wiki-templates adding structure support to wikis on demand Anja Haake
Stephan Lukosch
Schummer T.
Structural editing and viewing
Tailoring
Template
Wiki
WikiSym 2005 - Conference Proceedings of the 2005 International Symposium on Wikis English 2005 This paper introduces the concept of wiki templates that allows end-users to determine the structure and appearance of a wiki page. In particular, this better supports editing of structured wiki pages. Wiki templates may be adapted (defined and redefined) by end-users. They may be applied if found helpful, but need not to be used, thus maintaining the simple wiki editing way. In addition, we introduce a methodology to reuse wiki templates among different wiki instances. We show how wiki templates have been successfully used in real-world applications in our CURE wiki engine. Copyright 2005 ACM. 0 0
Wikipedia and the Semantic Web The Missing Links Krotzsch
Markus
Denny Vrandečić
Max Völkel
Semantic web
Wikipedia
Wikimania'05 2005 Wikipedia is the biggest collaboratively created source of encyclopaedic knowledge. Growing beyond the borders of any traditional encyclopaedia, it is facing new problems of knowledge management: The current excessive usage of article lists and categories witnesses the fact that 19th century content organization technologies like inter-article references and indices are no longer su#cient for today's needs. Rather, it is necessary to allow knowledge processing in a computer assisted way, for example to intelligently query the knowledge base. To this end, we propose the introduction of typed links as an extremely simple and unintrusive way for rendering large parts of Wikipedia machine readable. We provide a detailed plan on how to achieve this goal in a way that hardly impacts usability and performance, propose an implementation plan, and discuss possible difficulties on Wikipedia's way to the semantic future of the World Wide Web. The possible gains of thisendeavor are huge; we sketch them by considering some immediate applications that semantic technologies can provide to enhance browsing, searching, and editing Wikipedia. 0 1