Attributing authorship of revisioned content

From WikiPapers
Jump to: navigation, search

Attributing authorship of revisioned content is a 2013 conference paper written in English by De Alfaro L., Shavlovsky M. and published in WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web.

[edit] Abstract

A considerable portion of web content, from wikis to collaboratively edited documents, to code posted online, is revisioned. We consider the problem of attributing authorship to such revisioned content, and we develop scalable attribution algorithms that can be applied to very large bodies of revisioned content, such as the English Wikipedia. Since content can be deleted, only to be later re-inserted, we introduce a notion of authorship that requires comparing each new revision with the entire set of past revisions. For each portion of content in the newest revision, we search the entire history for content matches that are statistically unlikely to occur spontaneously, thus denoting common origin. We use these matches to compute the earliest possible attribution of each word (or each token) of the new content. We show that this \earliest plausible attribution" can be computed efficiently via compact summaries of the past revision history. This leads to an algorithm that runs in time proportional to the sum of the size of the most recent revision, and the total amount of change (edit work) in the revision history. This amount of change is typically much smaller than the total size of all past revisions. The resulting algorithm can scale to very large repositories of revisioned content, as we show via experimental data over the English Wikipedia Copyright is held by the International World Wide Web Conference Committee (IW3C2).

[edit] References

This section requires expansion. Please, help!

Cited by

Probably, this publication is cited by others, but there are no articles available for them in WikiPapers.