Text differencing

From WikiPapers
Jump to: navigation, search

text differencing is included as keyword or extra keyword in 0 datasets, 1 tools and 3 publications.

Datasets

There is no datasets for this keyword.

Tools

Tool Operating System(s) Language(s) Programming language(s) License Description Image
Wikiwho English Python MIT license Wikiwho Fast and accurate processing of revision differences for authorship detection. More information: http://f-squared.org/wikiwho Logo wikiwho transbg.png


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
WikiWho: Precise and Efficient Attribution of Authorship of Revisioned Content Fabian Flöck
Maribel Acosta
World Wide Web Conference 2014 English 2014 Revisioned text content is present in numerous collaboration platforms on the Web, most notably Wikis. To track authorship of text tokens in such systems has many potential applications; the identification of main authors for licensing reasons or tracing collaborative writing patterns over time, to name some. In this context, two main challenges arise. First, it is critical for such an authorship tracking system to be precise in its attributions, to be reliable for further processing. Second, it has to run efficiently even on very large datasets, such as Wikipedia. As a solution, we propose a graph-based model to represent revisioned content and an algorithm over this model that tackles both issues effectively. We describe the optimal implementation and design choices when tuning it to a Wiki environment. We further present a gold standard of 240 tokens from English Wikipedia articles annotated with their origin. This gold standard was created manually and confirmed by multiple independent users of a crowdsourcing platform. It is the first gold standard of this kind and quality and our solution achieves an average of 95% precision on this data set. We also perform a first-ever precision evaluation of the state-of-the-art algorithm for the task, exceeding it by over 10% on average. Our approach outperforms the execution time of the state-of-the-art by one order of magnitude, as we demonstrate on a sample of over 240 English Wikipedia articles. We argue that the increased size of an optional materialization of our results by about 10% compared to the baseline is a favorable trade-off, given the large advantage in runtime performance. 0 0
What Did They Do? Deriving High-Level Edit Histories in Wikis Peter Kin-Fong Fong
Robert P. Biuk-Aghai
WikiSym English 2010 Wikis have become a popular online collaboration platform. Their open nature can, and indeed does, lead to a large number of editors of their articles, who create a large number of revisions. These editors make various types of edits on an article, from minor ones such as spelling correction and text formatting, to major revisions such as new content introduction, whole article re-structuring, etc. Given the enormous number of revisions, it is difficult to identify the type of contributions made in these revisions through human observation alone. Moreover, different types of edits imply different edit significance. A revision that introduces new content is arguably more significant than a revision making a few spelling corrections. By taking edit types into account, better measurements of edit significance can be produced. This paper proposes a method for categorizing and presenting edits in an intuitive way and with a flexible measure of significance of each individual editor’s contributions. 11 2
What did they do? Deriving high-level edit histories in wikis Fong P.K.-F.
Biuk-Aghai R.P.
WikiSym 2010 English 2010 Wikis have become a popular online collaboration platform. Their open nature can, and indeed does, lead to a large number of editors of their articles, who create a large number of revisions. These editors make various types of edits on an article, from minor ones such as spelling correction and text formatting, to major revisions such as new content introduction, whole article re-structuring, etc. Given the enormous number of revisions, it is difficult to identify the type of contributions made in these revisions through human observation alone. Moreover, different types of edits imply different edit significance. A revision that introduces new content is arguably more significant than a revision making a few spelling corrections. By taking edit types into account, better measurements of edit significance can be produced. This paper proposes a method for categorizing and presenting edits in an intuitive way and with a flexible measure of significance of each individual editor's contributions. 0 2