From WikiPapers
Jump to: navigation, search

authorship is included as keyword or extra keyword in 0 datasets, 2 tools and 9 publications.


There is no datasets for this keyword.


Tool Operating System(s) Language(s) Programming language(s) License Description Image
Authorship Tracking Cross-platform None Python BSD License Authorship Tracking This code implements the algorithms for tracking the authorship of text in revisioned content that have been published in WWW 2013:

The idea consists in attributing each portion of text to the earliest revision where it appeared. For instance, if a revision contains the sentence "the cat ate the mouse", and the sentence is deleted, and reintroduced in a later revision (not necessarily as part of a revert), once re-introduced it is still attributed to its earliest author.

Precisely, the algorithm takes a parameter N. If a sequence of tokens of length equal or greater than N has appeared before, it is attributed to its earliest occurrence. See the paper for details.

The code works by building a trie-based representation of the whole history of the revisions, in an object of the class AuthorshipAttribution. Each time a new revision is passed to the object, the object updates its internal state and it computes the earliest attribution of the new revision, which can be then easily obtained. The object itself can be serialized (and de-serialized) using json-based methods.

To avoid the representation of the whole past history from growing too much, we remove from the object the information about content that has been absent from revisions (a) for at least 90 days, and (b) for at least 100 revisions. These are configurable parameters. With these choices, for the Wikipedia, the serialization of the object has size typically between 10 and 20 times the size of a typical revision, even for pages with very long revision lists. See paper for detailed experimental results.
Wikiwho English Python MIT license Wikiwho Fast and accurate processing of revision differences for authorship detection. More information: Logo wikiwho transbg.png


Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
WikiWho: Precise and Efficient Attribution of Authorship of Revisioned Content Fabian Flöck
Maribel Acosta
World Wide Web Conference 2014 English 2014 Revisioned text content is present in numerous collaboration platforms on the Web, most notably Wikis. To track authorship of text tokens in such systems has many potential applications; the identification of main authors for licensing reasons or tracing collaborative writing patterns over time, to name some. In this context, two main challenges arise. First, it is critical for such an authorship tracking system to be precise in its attributions, to be reliable for further processing. Second, it has to run efficiently even on very large datasets, such as Wikipedia. As a solution, we propose a graph-based model to represent revisioned content and an algorithm over this model that tackles both issues effectively. We describe the optimal implementation and design choices when tuning it to a Wiki environment. We further present a gold standard of 240 tokens from English Wikipedia articles annotated with their origin. This gold standard was created manually and confirmed by multiple independent users of a crowdsourcing platform. It is the first gold standard of this kind and quality and our solution achieves an average of 95% precision on this data set. We also perform a first-ever precision evaluation of the state-of-the-art algorithm for the task, exceeding it by over 10% on average. Our approach outperforms the execution time of the state-of-the-art by one order of magnitude, as we demonstrate on a sample of over 240 English Wikipedia articles. We argue that the increased size of an optional materialization of our results by about 10% compared to the baseline is a favorable trade-off, given the large advantage in runtime performance. 0 0
Attributing authorship of revisioned content Luca de Alfaro
Shavlovsky M.
WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web English 2013 A considerable portion of web content, from wikis to collaboratively edited documents, to code posted online, is revisioned. We consider the problem of attributing authorship to such revisioned content, and we develop scalable attribution algorithms that can be applied to very large bodies of revisioned content, such as the English Wikipedia. Since content can be deleted, only to be later re-inserted, we introduce a notion of authorship that requires comparing each new revision with the entire set of past revisions. For each portion of content in the newest revision, we search the entire history for content matches that are statistically unlikely to occur spontaneously, thus denoting common origin. We use these matches to compute the earliest possible attribution of each word (or each token) of the new content. We show that this \earliest plausible attribution" can be computed efficiently via compact summaries of the past revision history. This leads to an algorithm that runs in time proportional to the sum of the size of the most recent revision, and the total amount of change (edit work) in the revision history. This amount of change is typically much smaller than the total size of all past revisions. The resulting algorithm can scale to very large repositories of revisioned content, as we show via experimental data over the English Wikipedia Copyright is held by the International World Wide Web Conference Committee (IW3C2). 0 0
Bots Nicht-menschliche Mitglieder der Wikipedia-Gemeinschaft Robin D. Fink
Tobias Liboschik
German 2010 0 0
Textual curators and writing machines: authorial agency in encyclopedias, print to digital Krista A. Kennedy English July 2009 Wikipedia is often discussed as the first of its kind: the first massively collaborative, Web-based encyclopedia that belongs to the public domain. While it’s true that wiki technology enables large-scale, distributed collaborations in revolutionary ways, the concept of a collaborative encyclopedia is not new, and neither is the idea that private ownership might not apply to such documents. More than 275 years ago, in the preface to the 1728 edition of his Cyclopædia, Ephraim Chambers mused on the intensely collaborative nature of the volumes he was about to publish. His thoughts were remarkably similar to contemporary intellectual property arguments for Wikipedia, and while the composition processes involved in producing these texts are influenced by the available technologies, they are also unexpectedly similar. This dissertation examines issues of authorial agency in these two texts and shows that the “Author Construct” is not static across eras, genres, or textual technologies. In contrast to traditional considerations of the poetic author, the encyclopedic author demonstrates a different form of authorial agency that operates within strict genre conventions and does not place a premium on originality. This and related variations challenge contemporary ideas concerning the divide between print and digital authorship as well as the notion that new media intellectual property arguments are without historical precedent. 25 0
What's mine is mine: Territoriality in collaborative authoring Thom-Santelli J.
Dan Cosley
Geri Gay
Conference on Human Factors in Computing Systems - Proceedings English 2009 Territoriality, the expression of ownership towards an object, can emerge when social actors occupy a shared social space. In the case of Wikipedia, the prevailing cultural norm is one that warns against ownership of one's work. However, we observe the emergence of territoriality in online space with respect to a subset of articles that have been tagged with the Maintained template through a qualitative study of 15 editors who have self-designated as Maintainers. Our participants communicated ownership, demarcated boundaries and asserted their control over artifacts for the sake of quality by appropriating existing features of Wikipedia. We then suggest design strategies to support these behaviors in the proper context within collaborative authoring systems more generally. Copyright 2009 ACM. 0 0
Wiki Trust Metrics based on Phrasal Analysis Mark Kramer
Andy Gregorowicz
Bala Iyer
WikiSym English 2008 0 0
Who writes Wikipedia? Aaron Swartz English 4 September 2006 0 4
Internet encyclopaedias go head to head Jim Giles Nature English 14 December 2005 Jimmy Wales' Wikipedia comes close to Britannica in terms of the accuracy of its science entries, a Nature investigation finds. 0 50
Wikipedia and The Disappearing "Author" Nora Miller ETC: A Review of General Semantics English January 2005 0 2