List of doctoral theses

From WikiPapers
(Redirected from List of doctoral thesis)
Jump to: navigation, search

This is a list of all the doctoral theses available in WikiPapers. Currently, there are 52 doctoral theses.

Export: BibTeX, CSV, RDF, JSON

To create a new "doctoral thesis" go to Form:Publication.


Doctoral theses

Title Author(s) Keyword(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia Maik Anderka Information quality
Wikipedia
Quality Flaws
Quality Flaw Prediction
Bauhaus-Universität Weimar, Germany English 2013 Web applications that are based on user-generated content are often criticized for containing low-quality information; a popular example is the online encyclopedia Wikipedia. The major points of criticism pertain to the accuracy, neutrality, and reliability of information. The identification of low-quality information is an important task since for a huge number of people around the world it has become a habit to first visit Wikipedia in case of an information need. Existing research on quality assessment in Wikipedia either investigates only small samples of articles, or else deals with the classification of content into high-quality or low-quality. This thesis goes further, it targets the investigation of quality flaws, thus providing specific indications of the respects in which low-quality content needs improvement. The original contributions of this thesis, which relate to the fields of user-generated content analysis, data mining, and machine learning, can be summarized as follows:

(1) We propose the investigation of quality flaws in Wikipedia based on user-defined cleanup tags. Cleanup tags are commonly used in the Wikipedia community to tag content that has some shortcomings. Our approach is based on the hypothesis that each cleanup tag defines a particular quality flaw.

(2) We provide the first comprehensive breakdown of Wikipedia's quality flaw structure. We present a flaw organization schema, and we conduct an extensive exploratory data analysis which reveals (a) the flaws that actually exist, (b) the distribution of flaws in Wikipedia, and, (c) the extent of flawed content.

(3) We present the first breakdown of Wikipedia's quality flaw evolution. We consider the entire history of the English Wikipedia from 2001 to 2012, which comprises more than 508 million page revisions, summing up to 7.9 TB. Our analysis reveals (a) how the incidence and the extent of flaws have evolved, and, (b) how the handling and the perception of flaws have changed over time.

(4) We are the first who operationalize an algorithmic prediction of quality flaws in Wikipedia. We cast quality flaw prediction as a one-class classification problem, develop a tailored quality flaw model, and employ a dedicated one-class machine learning approach. A comprehensive evaluation based on human-labeled Wikipedia articles underlines the practical applicability of our approach.
0 0
Damage Detection and Mitigation in Open Collaboration Applications Andrew G. West University of Pennsylvania English 2013 Collaborative functionality is changing the way information is

amassed, refined, and disseminated in online environments. A subclass of these systems characterized by "open collaboration" uniquely allow participants to *modify* content with low barriers-to-entry. A prominent example and our case study, English Wikipedia, exemplifies the vulnerabilities: 7%+ of its edits are blatantly unconstructive. Our measurement studies show this damage manifests in novel socio-technical forms, limiting the effectiveness of computational detection strategies from related domains. In turn this has made much mitigation the responsibility of a poorly organized and ill-routed human workforce. We aim to improve all facets of this incident response workflow.

Complementing language based solutions we first develop content agnostic predictors of damage. We implicitly glean reputations for system entities and overcome sparse behavioral histories with a spatial reputation model combining evidence from multiple granularity. We also identify simple yet indicative metadata features that capture participatory dynamics and content maturation. When brought to bear over damage corpora our contributions: (1) advance benchmarks over a broad set of security issues ("vandalism"), (2) perform well in the first anti-spam specific approach, and (3) demonstrate their portability over diverse open collaboration use cases.

Probabilities generated by our classifiers can also intelligently route human assets using prioritization schemes optimized for capture rate or impact minimization. Organizational primitives are introduced that improve workforce efficiency. The whole of these strategies are then implemented into a tool ("STiki") that has been used to revert 350,000+ damaging instances from Wikipedia. These uses are analyzed to learn about human aspects of the edit review process, properties including scalability, motivation, and latency. Finally, we conclude by measuring practical impacts of work, discussing how to better integrate our solutions, and revealing outstanding vulnerabilities

that speak to research challenges for open collaboration security.
0 0
Erfolgsfaktoren von Social Media: Wie "funktionieren" Wikis? Florian L. Mayer Wiki
Organizational Communication
Success
Collaboration
Online collaboration
Otto-Friedrich-Universität Bamberg German 2013 Wann sind Wikis oder allgemeiner: Social Media erfolgreich? Wenn sie kommunikativ "lebendig" sind! Diesem "kommunikativen Erfolg" liegen Strukturprinzipien zugrunde, die diese Arbeit sichtbar macht. Sie beschreibt konkrete Aufmerksamkeits-, Motivations- und Organisationsstrukturen, und macht so den Erfolg der Leuchttürme wie Wikipedia oder Facebook, aber auch die Schwierigkeiten im Einsatz von Social Media in Organisationen und Gruppen verstehbar. Mit den Begriffen Mikrokommunikation und Mikrokollaboration liefert sie darüber hinaus eine Beschreibung neuer Formen gesellschaftlicher Kommunikation. 0 0
Mass Collaboration or Mass Amateurism? A comparative study on the quality of scientific information produced using Wiki tools and concepts Fernando Rodrigues Mass Collaboration
Collective intelligence
Crowdsourcing
Information Systems
Data Quality
Wikipedia
Encyclopaedia Britannica
Universidade Évora Portuguese December 2012 With this PhD dissertation, we intend to contribute to a better understanding of the Wiki phenomenon as a knowledge management system which aggregates private knowledge. We also wish to check to what extent information generated through anonymous and freely bestowed mass collaboration is reliable as opposed to the traditional approach.

In order to achieve that goal, we develop a comparative study between Wikipedia and Encyclopaedia Britannica with regard to accuracy, depth and detail of information in both, in order to confront the quality of the knowledge repository produced by them. That will allow us to reach a conclusion about the efficacy of the business models behind them.

We will use a representative random sample which is composed by the articles that are comprised in both encyclopedias. Each pair of articles was previously reformatted and then graded by an expert in its subject area. At the same time, we collected a small convenience sample which only integrates Management articles. Each pair of articles was graded by several experts in order to determine the uncertainty associated with having diverse gradings of the same article and apply it to the evaluations carried out by just one expert. The conclusion was that the average quality of the Wikipedia articles which were analysed was superior to its peers’ and that this difference was statistically significant.

An inquiry was conducted within the academia which certified that traditional information sources were used by a minority as the first approach to seeking information. This inquiry also made clear that reliance on these sources was considerably larger than reliance on information obtained through Wikipedia. This quality perception, as well as the diametrically opposed results of its evaluation through a blind test, reinforces the evaluating panel’s exemption.

However much the chosen sample is representative of the universe to be studied, results have depended on the evaluators’ personal opinion and chosen criteria. This means that the reproducibility of this study’s conclusions using a different grading panel cannot be guaranteed. Nevertheless, this is not enough of a reason to reject the study results obtained through more than five hundred evaluations.

This thesis is thus an attempt to help clarifying this topic and contributing to a better perception of the quality of a tool which is daily used by millions of people, of the mass collaboration which feeds it and of the collaborative software that supports it.
0 0
Network of Knowledge: Wikipedia as a Sociotechnical System of Intelligence Randall M. Livingstone University of Oregon English September 2012 0 0
WikiTrust: Content-Driven Reputation for the Wikipedia B. Thomas Adler English June 2012 0 0
Sum of All Knowledge: Wikipedia and the Encyclopedic Urge Erinç. Salor Amsterdam School for Cultural Analysis, University of Amsterdam English 2012 In March 2012, one year afer the 10th anniversary of Wikipedia, the free, online encyclopedia that anyone can edit, the editors of the Encyclopaedia Britannica announced that no more print editions of the venerable encyclopedia will be made, marking the end of a print run that span more than two hundred and forty years. Building on this period of disruption of the encyclopedic form, this study offers an understanding of Wikipedia distilled through the heritage of Western encyclopedic tradition. 0 0
Processos editoriais auto-organizados na Wikipédia em português: a edição colaborativa de "Biografias de Pessoas Vivas" Carlos Frederico de Brito d’Andréa Wikipedia
Edition
Rewritting
Colaboration
Self-organization
Complexity
Portuguese September 2011 This dissertation maps and analyzes the dynamics of editions in a sample of articles of the Portuguese version of Wikipedia. We identify and discuss the self-organized and collaborative processes in its editorial network, as well as how the editors rewrite the articles over time. This research begins with conceptual considerations about the “encyclopedia that anyone can edit”, focusing on trends of the Portuguese version and specifically on the “Biographies of Living People”, which are characterized by the possibility of including, “in real time”, factual information about the life and work of influent people. The theoretical framework is composed by authors from different areas. In Text Linguistics, we discuss the concepts of text (BEAUGRANDE, 1997; COSCARELLI, 2006), textuality (COSTA VAL, 2004), retextualization and rewritting (DELL’ISOLA, 2007; MARCUSCHI, 2000; MATENCIO, 2002). Besides that, we discuss the editorial processes and professional activities (like copy editing) in the “production networks” of books and encyclopedias, especially after the use of digital technologies. In chapter 3, we discuss the networked editorial production based on the internet and inspired in “hacker culture” and “open source softwares”. In this context, the most important concepts are “commonbased peer production” (BENKLER, 2006), “The Wisdom of Crowds” (SUROWIECKI, 2007), “produsage” (BRUNS, 2008), “virtual community” e “crowdsourcing” (HAYTHORNTHWAITE, 2009). We also present the relationships between this new model and traditional editorial processes, like “networked book” and “wiki-journalism”. After that, we relate networked editorial production with complexity paradigm and discuss Wikipedia as a complex adaptive system (HOLLAND, 1995; LARSEN-FREEMAN e CAMERON, 2008) that, potentially, works in a self-organized and emergent dynamics (DEBRUN, 1996a, 1996b; DE WOLF e HOLVOET, 2005). The empirical study of this thesis is based in 91 “Biographies of Living People” about most influential Brazilian personalities in the year of 2009 according two national magazines (“Época” and “Isto É”). In the quantitative phase of this work, we extracted data in articles history pages using a software (WikipediAnalyserPT) developed for this research. After making statistical analyses, we compared the edition processes of these articles using variables as “total of editions”, “editions made by groups of editors” (registered, non-registered, administrators and bots), “protections”, “reversions” etc. At the qualitative stage, we detail the dynamics of edition of five of articles and analyze the rewrittings of the texts and the interactions between the editors. Three articles were chosen because the “key variables” are very similar: the biographies of “Franklin Martins” (a journalist that worked in president Lula's government), “Kátia Abreu” (a senator known for defending owners of very large land areas) and “Ricardo Teixeira” (a president of the Brazilian Football Confederation). After that, we analyze the dynamics of two of the most edited articles of the sample: the biographies about the famous soccer players "Adriano Leite Ribeiro" (nicknamed "The Emperor") and "Ronaldo Nazario of Lima (also known as "The Phenomenon"). In the three intermediate articles, we identified a relative stability (caused by a few number of editions monthly) interspersed with short periods of time with more editions and disputes. We also observed that a few editors made almost all the “important” editions. In the two more edited biographies, we noticed an uninterrupted movement of the editors, hundreds of vandalisms and many war editions. Although also in these articles only a few editions are preserved, we identify an “emergence” pattern characterized by disputes that encourage the collaboration among agents. At the conclusion, we discuss the possibilities and challenges of a “wikification” of editorial processes. 60 0
Design Mechanisms for MediaWiki to Support Collaborative Writing in a Mandatory Context Sumonta Kasemvilas Design
Information technology
Educational technology
English August 2011 Because MediaWiki is not appropriate for use in the classroom setting due to its decentralization, arbitrariness, and sharing, its flexible characteristics complicate concepts of practical design when applying MediaWiki in a mandatory writing context. This dissertation identifies a need to add extensions to facilitate increased accountability, project management, discussion, and awareness based on a theoretical framework, proposes MediaWiki with some modifications as an innovative way to optimize the strengths associated with constructivist learning and social presence, and examines the results of those changes. Relevant theoretical perspectives are used to contextualize the potential significance of additional extensions of MediaWiki. Three categories of mechanisms in MediaWiki—role, awareness, and project management—were newly developed in this research. They are designed to increase project control and accountability. Discussion, chat, text editor, and online notification extensions were also installed and customized to meet the needs of the students. Two case studies were conducted in two separate graduate classes to test the value of the extensions. Quantitative and qualitative data were collected and analyzed. Use of qualitative methods helps add texture to quantitative findings. The findings illustrate some potential impact for classroom use. Delineation of the results in Case Study 1 and Case Study 2 provides well-grounded rationale for why the proposed new MediaWiki mechanisms positively impact collaborative writing. By applying a set of extended features to MediaWiki, some problems were solved and others were mitigated, but other problems were not resolved and new problems emerged. Thus, this study articulates the benefits and the additional problems using MediaWiki and extensions and suggests ways to improve the group writing process. Using MediaWiki in academia needs appropriate governance and proper technology. The results potentially offer new teaching mechanisms for graduate students involved with collaborative writing. The study holds promise in improving collaborative efforts in mandatory group writing projects and discusses a way to facilitate collaborative writing in this context. Implications of this study can assist researchers and developers in understanding what effects the extensions have on users. 26 0
Wiki Readers Wiki Writers Thomas W. Reynolds Jr Communication
Web studies
Rhetoric
English July 2011 In 1995, the first wiki website, Ward Cunningham's Wiki WikiWeb, went public for the use of a community of computer programmers, and few outside of that community and those working in similar fields would have imagined wiki technology, a technology that allows visitors to a wiki-based web site to modify its structure and content. Fifteen years later, however, wiki comes to compositionists an already-loaded term. The mainstream media depicts wiki as a challenge to the ways we think about who writes and disseminates information, the nature of information itself, and who reads and how they read and use that information. At the same time, scholarship in the field of composition studies claims wiki as a writing tool that evidences and provides the process-centered, collaborative, democratized space for which researchers and teachers of writing have been looking. In both cases, the literature constructs ideas about what it means to be a writer and a reader in relation to wiki so that compositionists encounter wiki technology as always already described and defined. I analyze these oppositional perspectives on wiki technology and make it possible to move through, before, and beyond these constructions of readers and writers and the intellectual traditions through which they are made possible to make space for other readings of wiki technology and answer the following questions: How are the traditional roles of reader and writer articulated or challenged in the discourse surrounding wiki technology? How are the roles of readers and writers made possible through applications of wiki technology? I analyze the discourse surrounding wiki technology and then the writer and reader functions made possible in three wiki applications: Wikipedia, Scholarpedia, and Citizendium. It is the argument of this project that wiki makes visible and explicit the ways in which readers and writers have always already interacted, or at least desired to interact, providing a deeper and different understanding of the roles assumed by and constructed for readers and writers, an understanding that is situated within, without, and in the margins of the traditions that have always already constructed them (and wiki technology) differently. 32 0
Hackers, Cyborgs, and Wikipedians: The Political Economy and Cultural History of Wikipedia Andrew A. Famiglietti Wikipedia
Peer Production
Cultural Studies
New Media
Political Economy
English May 2011 This dissertation explores the political economy and cultural history of Wikipedia, the free encyclopedia. It demonstrates how Wikipedia, an influential and popular site of knowledge production and distribution, was influenced by its heritage from the hacker communities of the late twentieth century. More specifically, Wikipedia was shaped by an ideal I call, “the cyborg individual,” which held that the production of knowledge was best entrusted to a widely distributed network of individual human subjects and individually owned computers. I trace how this ideal emerged from hacker culture in response to anxieties hackers experienced due to their intimate relationships with machines. I go on to demonstrate how this ideal influenced how Wikipedia was understood both those involved in the early history of the site, and those writing about it. In particular, legal scholar Yochai Benkler seems to base his understanding of Wikipedia and its strengths on the cyborg individual ideal. Having established this, I then move on to show how the cyborg individual ideal misunderstands Wikipedia's actual method of production. Most importantly, it overlooks the importance of how the boundaries drawn around communities and shared technological resources shape Wikipedia's content. I then proceed to begin the process of building what I believe is a better way of understanding Wikipedia, by tracing how communities and shared resources shape the production of recent Wikipedia articles. 70 0
Le copyleft appliqué à la création hors logiciel Antoine Moreau French May 2011 Copyleft is a legal notion stemming from the free software movement which, while observing the author's rights, allows copying, spreading and transforming works and forbids the exclusive enjoyment of them. It is the Free Software Foundation's GNU project, initiated by Richard Stallman, with the first free copyleft license for software: the General Public License. Our research deals with copyleft applied to non-software creation such as we have initiated it in 2000 with the Free Art License. Through our practicing it and noticing its effects, we raise questions about the status of the author in the digital age. We discover a history, a history of art, which is not determined by an end anymore but leads on to infinite creations made by an infinity of artists, both minor and consequent. We observe that copyleft is not an ordinary creation process, but a decreation process. It asserts, negatively and through the flaws, not negation or failure, but the beauty of a gesture graciously offering itself. This gesture combines ethics and aesthetics, it is « es-ethical ». We understand that with copyleft, technique serves a politics of « hyper-democratic » opening as seen in the Web's hypertext structure which punches holes through pages and opens onto otherness. It is about articulating the singular and the plural in an ecosystem preserving the common good from the passion of power. A broadened economy exceeds, without negating it, the sole market. Copyleft works assert that political and cultural reality where art forms the freedom common to all and to each. 0 1
Sustainable multilingual communication: Managing multilingual content using free and open source content management systems Todd Kelsey English May 2011 It is often too complicated or expensive for most educators, non-profits and individuals to create and maintain a multilingual Web site, because of the technological hurdles, and the logistics of working with content in different languages. But multilingual content management systems, combined with streamlined processes and inexpensive organizational tools, make it possible for educators, non-profit entities and individuals with limited resources to develop sustainable and accessible multilingual Web sites. The research included a review of what's been done in the theory and practice of designing Web sites for multilingual audiences. On the basis of that review, a series of sustainable multilingual Web sites were created, and a series of approaches and systems were tested, including MediaWiki, Plone, Drupal, Joomla, PHPMyFAQ, Blogger, Google Docs and Google Sites. There was also a case study on "Social CMS", which refers to emergent social networks such as Facebook. The case studies are reported on, and conclude with high-level recommendations that form a roadmap for sustainable multilingual Web site development. The basic conclusion is that Drupal is a recommended system for developing a multilingual Web site, based on a variety of factors. Google Sites is also a recommended system, based on the fact that it is free, easy to use, and very flexible. 9 0
Using wikis to experience history Vance Scott Martin Action research
History education
Technology in education
Community college
Wiki
English May 2011 This dissertation is an action research study examining the use of technology to encourage critical thinking and digital literacy in a community college history class. The students are responsible for researching course material and teaching the class. They then use a wiki to contribute to and edit an interactive, online textbook that has been created by students over several semesters. The goal is to link more interactive technologies with what the author terms socially democratic education, by empowering students to create knowledge and encouraging them to consider biases in historical writing.

Two main research questions are considered, each with related sub-questions. First, what do students experience using an educational wiki and an open classroom? Are the students able to think critically about history? The work of Giroux (1978) is used to discuss the critical thinking that emerged in the class.

Second, what are the relationships between the wiki and open classroom, and democratic education? How is that observable? What role does the teacher play? Is this a critical pedagogy? Evidence of socially democratic learning is examined, and Freire (2009) is used to analyze the presence of a critical pedagogy.

Several issues are raised as the result of the study, and their implications are discussed. These include the loss of teacher control with this type of pedagogy, the need for a balance between allowing freedom for discovery and organizational structure, and issues related to trust and identity.
3 0
Collaborative Wikipedia Hosting Wikipedia
Collaborative web hosting
P2P
English
Dutch
2011 0 0
La négociation des contributions dans les wikis publics : légitimation et politisation de la cognition collective Anne Goldenberg Department of Communication, UQAM, University of Quebec At Montreal Canada, Department of Sociology, Unice, University of Nice, France French 2011 Les wikis publics permettent à leur lecteur de participer à l'écriture de leur contenu. Cette recherche s'inscrit dans une perspective d'anthropologie des savoirs, en ce qu'elle vise à comprendre ce que les wikis permettent d'un point de vue cognitif et ce qu'ils questionnent d'un point de vue épistémique et politique. L'usage de ces artefacts s'est notamment répandu avec leur implémentation pour de grands projets de construction et d'organisation de connaissances (encyclopédie, documentation en ligne). Nous émettons l’hypothèse que cette appropriation renvoie à deux enjeux anthropologiques majeurs, l'un ayant trait à la forme de cognition (distribuée socialement et techniquement) rendue possible par les wikis, l'autre étant lié aux problèmes d'organisation de communautés ouvertes dont le projet est épistémique. Cette recherche a pour but premier d'analyser la contribution en tant qu'activité constitutive des projets menés sur les wikis publics. Se faisant, elle questionne non seulement les conditions épistémiques, mais aussi les conditions politiques de la construction de connaissances sur un mode collectif et médiatisé. Notre second but est de comprendre le rôle de la négociation dans la contribution. Ayant défini que les communautés épistémiques se reconnaissent au fait qu'elles produisent des connaissances de façon délibérée et délibérative, nous avons postulée que l'analyse du déroulement des négociations entourant des contributions allait nous permettre d'étudier la façon dont les acteurs gèrent l'intrication des dimensions sociales, politiques et épistémiques. À partir de là, nous avons articulé notre travail autour de cette question : Que révèle l'étude de la négociation des contributions au regard de la construction des connaissances et des conventions de participation dans les communautés épistémiques ? Pour étudier les contributions et leurs négociations, nous avons procédé en trois étapes. D'abord, nous avons réalisé un travail de définition des concepts de communautés épistémiques, de contribution et de politisation. Nous avons aussi pris soin de définir en quoi l'étude des négociations pouvait nous aider à comprendre la participation aux problèmes épistémiques et sociaux. Ensuite, pour réfléchir à ce qui caractérise la contribution épistémique, nous nous sommes appuyée sur le résultat d'une enquête par questionnaires en ligne et sur des entretiens menés auprès de contributeurs de trois communautés (issus du wiki de Debian, du wiki de la communauté Ubuntu-fr et du Projet:Québec de Wikipédia.fr). Ces trois études de cas devaient nous permettre de comparer des témoignages sur l'activité de contribution pour dégager des éléments caractéristiques. En troisième lieu, nous avons procédé à l'analyse détaillée des échanges s'organisant autour d'un litige ayant trait soit aux contenus, soit aux dispositifs, soit aux règles internes de chacune de ces communautés. L'analyse conceptuelle nous a mené à proposer quatre caractéristiques de la contribution. Il s'agirait d'une activité motivée par un intérêt personnel, orientée vers un objectif de mise en commun, impliquant une délibération et une forme de reconnaissance liée aux compétences. Le travail d'enquête nous aura amené à revoir cette caractérisation en considérant ces points comme étant avant tout l'objet de tensions caractéristiques. Ainsi avons-nous découvert que les motivations et attentes des contributeurs se construisaient en tension entre intérêt personnel et collectif, que les principes de sélection des contributions étaient constamment débattus, de même que le rapport à la reconnaissance et à l'identification des participants dans un univers où l'anonymat est souvent de mise. Cependant l'enquête révèle que loin d'être nuisibles, ces mises en débat semblent structurantes, ce qui renforce l'idée que les négociations d'ordre social et politique jouent un rôle majeur dans la vie des wikis. Finalement, l'analyse détaillée nous a permis de distinguer ce qui, des négociations, relevait d'une dispute sociale et d'une dispute épistémique. Elle nous a aussi permis d'observer que les contributeurs les plus impliqués savaient eux aussi faire cette distinction et passer d'un domaine à l'autre à des fins de résolution d'une dispute. Cela nous amène à conclure que les wikis supportent bien une forme nouvelle de cognition collective. Nous pensons voir émerger une culture de la contribution qui s'appuie sur une appropriation communautaire des enjeux politiques et épistémiques ayant trait à une forme participative de production de connaissances. Nous soulignons finalement les formes d'exclusions propres à ce phénomène : les inégalités de l'accès et de la participation à cette forme d'écrit public, la sous-représentation des femmes et des communautés culturelles minoritaires, ainsi que les risques de bureaucratisation, de manipulation de l'information et de formation d'une élite technique ou politique. 0 0
Methods of Semantic Drift Reduction in Large Similarity Networks Ł. Bolikowski Systems Research Institute, Polish Academy of Sciences English 2011 We have investigated the problem of clustering documents according to their semantics, given incomplete and incoherent hints reflecting the documents’ affinities. The problem has been rigorously defined using graph theory in set-theoretic notation. We have proved the problem to be NP-hard, and proposed five heuristic algorithms which deal with the problem using five quite different approaches: a greedy algorithm, an iterated finding of maximum cliques, energy minimization inspired by molecular mechanics, a genetic algorithm, and an adaptation of the Girvan-Newman algorithm. As a side effect of the fourth heuristic, an efficient and aesthetically appealing method of visualization of the large graphs in question has been developed. The approaches have been tested empirically on the network of links between articles from over 250 language editions of Wikipedia. A thorough analysis of the network has been performed, showing surprisingly large semantic drift patterns and an uncommon topology: a scale-free skeleton linking tight clusters. It has been demonstrated that, using a blend of the proposed approaches, it is possible to automatically detect, and to a large extent eliminate, the semantic drift in the network of links between the language editions of Wikipedia. Last but not least, an open-source implementation of the proposed algorithms has been documented. 0 0
A Cultural and Political Economy of Web 2.0 Robert W. Gehl English 2010 In this dissertation, I explore Web 2.0, an umbrella term for Web-based software and services such as blogs, wikis, social networking, and media sharing sites. This range of Web sites is complex, but is tied together by one key feature: the users of these sites and services are expected to produce the content included in them. That is, users write and comment upon blogs, produce the material in wikis, make connections with one another in social networks, and produce videos in media sharing sites. This has two implications. First, the increase of user-led media production has led to proclamations that mass media, hierarchy, and authority are dead, and that we are entering into a time of democratic media production. Second, this mode of media production relies on users to supply what was traditionally paid labor. To illuminate this, I explore the popular media discourses which have defined Web 2.0 as a progressive, democratic development in media production. I consider the pleasures that users derive from these sites. I then examine the technical structure of Web 2.0. Despite the arguments that present Web 2.0 as a mass appropriation of the means of media production, I have found that Web 2.0 site owners have been able to exploit users' desires to create content and control media production. Site owners do this by deploying a dichotomous structure. In a typical Web 2.0 site, there is a surface, where users are free to produce content and make affective connections, and there is a hidden depth, where new media capitalists convert user-generated content into exchange-values. Web 2.0 sites seek to hide exploitation of free user labor by limiting access to this depth. This dichotomous structure is made clearer if it is compared to the one Web 2.0 site where users have largely taken control of the products of their labor: Wikipedia. Unlike many other sites, Wikipedia allows users to see into and determine the legal, technical, and cultural depths of that site. I conclude by pointing to the different cultural formations made possible by eliminating the barrier between surface and depth in Web software architecture. 13 0
A hypersocial-interactive model of Wiki-mediated writing: Collaborative writing in a fan & gamer community Rik Hunter English 2010 In this dissertation I argue that writing is a technologically- and socially-inflected activity, and the particular patterns of collaborative writing found on the World of Warcraft Wiki (WoWWiki) are the result of the interactions between a MediaWiki's affordances and the social practices operating in this context. In other contexts, collaborative writing can more closely resemble the "conventional ethos" (Knobel and Lankshear, 2007) of more individualistic notions of authorship often tied to print. With writing projects such as WoWWiki, we can observe a dramatic shift in notions of textual ownership and production towards the communal and collaborative, and I suggest the patterns of collaboration found on WoWWki are evidence of a larger technocultural shift signaling new conditions for literacy. In the midst of this shift, the meaning of "collaboration," "authorship," and "audience" is redefined.

Following my introductory chapter, I use textual analysis of talk pages to examine the talk pages of several of WoWWiki featured articles for particular patterns of language use and identify what WoWWikians focus their attention on in the process of writing articles. I argue that collaboration on WoWWiki poses a challenge to models of face to face writing groups and offers unique patterns of collaboration.

I then contend that WoWWiki's writing practices are entering a society where the idea of the single author has been strong. Nevertheless, I find evidence of a shared model of text production and collaborative notion of authorship; further, collaboration is disrupted by those who hold author-centric perspectives.

Next, I argue that our previous models of audience and writing previously developed around print and, later, hypertext are inadequate because they cannot account for roles readers can take and how writers and readers interact on a wiki. With this new arrangement in collaborative writing evident on WoWWiki, I develop the hypersocial-interactive model of wiki-mediated writing.

I conclude by reviewing this dissertation's main arguments regarding wiki-mediated collaborative writing, after which I explore the implications of using wikis for writing instruction. Finally, I discuss the limitations of this study and consider directions for future research on voluntary collaborative wiki-mediated writing.
22 0
Cooperation and Cognition in Wikipedia Articles - A data-driven, philosophical and exploratory study R. Jesus Center for Philosophy of Science and Nature Studies, University of Copenhagen English 2010 Wikipedia has created and harnessed new social and work dynamics, which can provide insight into specific aspects of cognition, as amplified by a multitude of editors and their ping-pong style of editing, spatial and time flexibility within unique technology-community fostering features. Wikipedia's motto "The Free Encyclopedia That Anyone Can Edit" is analyzed to reveal human, technological and value actors within a theoretical context of distributed cognition, cooperation and technological agency. In the Data-driven studies using data from Wiki log pages, network visualization and bicliques are used and developed to focus closer on the process of collaboration in articles and meta-articles, and inside the article "Prisoner's dilemma" and the policy article "Neutral Point of View". The several tools used reveal clusters of interest, dense areas of coordination, a blend between coordination and direct editing work, and point to Wikipedia's dynamic stability in content and form. In the philosophical-cognitive studies, a distinction between Cognition for Planning and Cognition for Improvising is proposed to account for Wikipedia's success and mode of editing whereby many small edits are used for its improvement. In the exploratory part an installation of a 'live-Wiki' 'Our Coll/nn/ective Minds' piece reflects on several aspects of Wikis, free culture, open source, Do-It-Yourself by engaging in the debate in a more creative and participative form. These studies contribute to constructing an ecology of the article, a vision of humanities bottom-up, and a better understanding of cooperation and cognition within sociotechnological networks. 0 0
Dynamic link-based ranking over large-scale graph-structured data H. Hwang University of California, San Diego English 2010 Information Retrieval techniques have been the primary means of keyword search in document collections. However, as the amount and the diversity of avail- able semantic connections between objects increase, link-based ranking methods including {ObjectRank} have been proposed to provide high-recall semantic keyword search over graph-structured data. Since a wide variety of data sources can be modeled as data graphs, supporting keyword search over graph-structured data greatly improves the usability of such data sources. However, it is challenging in both online performance and result quality. We first address the performance issue of dynamic authority-based ranking methods such as personalized {PageRank} and {ObjectRank.} Since they dynamically rank nodes in a data graph using an expensive matrix-multiplication method, the online execution time rapidly increases as the size of data graph grows. Over the English Wikipedia dataset of 2007, {ObjectRank} spends 20-40 seconds to compute query-specific relevance scores, which is unacceptable. We introduce a novel approach, {BinRank,} that approximates dynamic link-based ranking scores efficiently. {BinRank} partitions a dictionary into bins of relevant keywords and then constructs materialized subgraphs {(MSGs)} per bin in preprocessing stage. In query time, to produce highly accurate {top-K} results efficiently, {BinRank} uses the {MSG} corresponding to the given keyword, instead of the original data graph. {PageRank} and {ObjectRank} calculate the global importance score and the query-specific authority score of each node respectively by exploiting the link structure of a given data graph. However, both measures favor nodes with high in-degree that may contain popular yet generic content, and thus those nodes are frequently included in {top-K} lists, regardless of given query. We propose a novel ranking measure, Inverse {ObjectRank,} which measures the content-specificity of each node by traversing the semantic links in the data graph in the reverse direction. Then, we allow users to adjust the importance of the three ranking measures (global importance, query-relevance, and content-specificity) to improve the quality of search results. 0 0
Expressing territoriality in online collaborative environments J. Thom-Santelli Cornell University, New York English 2010 Territoriality, the expression of ownership towards an object, can emerge when social actors occupy a shared social space. In this research, I extend the study of territoriality beyond previous work in physical space in two key ways: (1) the object in question is non-physical and (2) the social context is an online collaborative activity. To do this, I observe the emergence of characteristic territorial behaviors (e.g. marking, control, defense) in 3 studies of social software systems. Study 1 describes a qualitative interview study observes the behaviors of 15 Maintainers, a small group of lead users on Wikipedia. Findings suggest that The Maintainers communicate their feelings of ownership to other editors by appropriating features of the system, such as user templates and activity monitoring, to preserve control over the articles they maintain and communicate their knowledge of the article editing process to potential contributors. Study 2 describes a qualitative interview study observing the behaviors of 33 users of social tagging systems deployed within a large enterprise organization. Findings suggest that self-designated experts express territoriality regarding their knowledge and their status within the organization through their tagging strategies. Study 3 describes a field study of expert and novice users of a mobile social tagging system deployed within an art museum. Findings suggest that compared to novices, experts feel more personal ownership towards the museum and their tags and express territoriality regarding their expertise through higher levels of participation and are more likely to vote down novice-generated tags in a defensive manner. My dissertation draws from observations from these three studies to construct a theoretical framework for online territoriality to provide researchers and designers of groupware with guidelines with which to encourage ownership expression when appropriate. Topics for discussion and future work include clarifying the characteristics of non-physical territories, closer study of the possible reactions to territoriality, and describing the potential of territoriality as design resource for motivating experts to contribute. 0 0
Governance of online creation communities: Provision of infrastructure for the building of digital commons Mayo Fuster Morell European University Institute, Florence English 2010 This doctoral research is framed by the notion of a transition in which distinct commons organizational forms are gaining in importance at a time when the institutional principles of the nation state are in a state of profound crisis, and those of the private market are undergoing dramatic change. Additionally, the transformation of industrial society into a knowledge-based one is raising the importance of knowledge management, regulation and creation. This doctoral research addresses collective action for knowledge-making in the digital era from a double perspective of organizational and political conflict through the case of global online creation communities. From the organizational perspective, it provides an empirically grounded description of the organizational characteristics of emerging collective action. The research challenges previous literature by questioning the neutrality of infrastructure for collective action and demonstrating that infrastructure governance shapes collective action. Importantly, the research provides an empirical explanation of the organizational strategies most likely to succeed in creating large-scale collective action in terms of the size of participation and complexity of collaboration. From the political conflict perspective, this research maps the diverse models of governance of knowledge-making processes, addresses how these are embedded in each model of governance, and suggests a set of dimensions of democratic quality adapted to these forms. Importantly, it provides an empirically grounded characterization of two conflicting logics present in the conditions for collective action in the digital era: a commons versus a corporate logic of collective action. Additionally, the research sheds lights on the emerging free culture and access to knowledge movement as a sign of this conflict. In hypothesizing that the emerging forms of collective action are able to increase in terms of both participation and complexity while maintaining democratic principles, this research challenges Olson? assertion that formal organizations tend to overcome collective action dilemmas more easily, and challenges the classical statements of Weber and Michels that as organizations grow in size and complexity, they tend to create bureaucratic forms and oligarchies. This research concludes that online creation communities are able to increase in complexity while maintaining democratic principles. Additionally, in the light of this research, the emerging collective action forms are better characterized as hybrid ecosystems which succeed by networking and combining several components, each with differens degrees of formalization and organizational and democratic logics. 0 0
Modeling events in time using cascades of Poisson processes A. Simma University of California, Berkeley English 2010 For many applications, the data of interest can be best thought of as events--entities that occur at a particular moment in time, have features and may in turn trigger the occurrence of other events. This thesis presents techniques for modeling the temporal dynamics of events by making each event induce an inhomogeneous Poisson process of others following it. The collection of all events observed is taken to be a draw from the superposition of the induced Poisson processes, as well as a baseline process for some of the initial triggers. The magnitude and shape of the induced Poisson processes controls the number, timing and features of the triggered events. We provide techniques for parameterizing these processes and present efficient, scalable techniques for inference. The framework is then applied to three different domains that demonstrate the power of the approach. First, we consider the problem of identifying dependencies in a computer network through passive observation and provide a technique based on hypothesis testing for accurately discovering interactions between machines. Then, we look at the relationships between Twitter messages about stocks, using the application as a test-bed to experiment with different parameterizations of induced processes. Finally, we apply these tools to build a model of the revision history of Wikipedia, identifying how the community propagates edits from a page to its neighbors and demonstrating the scalability of our approach to very large datasets. 0 0
Social operational information, competence, and participation in online collective action J. Antin University of California, Berkeley English 2010 Recent advances in interactive web technologies, combined with widespread broadband and mobile device adoption, have made online collective action commonplace. Millions of individuals work together to aggregate, annotate, and share digital text, audio, images, and video. Given the prevalence and importance of online collective action systems, researchers have increasingly devoted attention to questions about how individuals interact with and participate them. I investigate these questions with the understanding that an individual's behaviors and attitudes depend in part on what they know and believe about how the online collaborative system operates--the nuts and bolts so to speak. In this dissertation I examine how social operational information --information and beliefs about the other people who act in online collective action systems--can influence individuals' attitudes, assumptions, behaviors, and motivations with respect to those systems. I examine the role of social operational information from two distinct but related perspectives. First, I employed a social psychological laboratory study to examine the influence of a specific type of social operational information: relative competence feedback. Experimental findings demonstrate that individuals who received information that they were of low relative competence compared to others contributed less to a collective good compared to those who received either average or high relative competence feedback. Two key attitudes about abilities and responsibilities in inter-dependent situations-- self-efficacy and social responsibility --mediated the competence-contribution relationship. Furthermore, individual participants' stable preferences about the distribution of rewards for themselves and other people (social value orientation) moderated the observed changes in contribution rates across experimental conditions. Secondly, I conducted a qualitative interview study of Wikipedia's infrequent editors and readers. The study focused on documenting and understanding participants' attitudes, beliefs, and assumptions about Wikipedia's social system and the other individuals who contribute to it. Interviews focused on questions about the nature of Wikipedia and its' user-generated system, the characteristics of the people who write Wikipedia, and the motivations that encourage their participation. Qualitative analysis revealed a variety of tensions around the nature of Wikipedia as an open, user-generated system, as well as between widespread negative stereotypes of contributors as geeks, nerds, and hackers and equally prevalent positive assumptions about their pro-social motivations for contributing to Wikipedia. I argue that these tensions reveal a transition towards a view of online collaborative work as open, creative, and focused on collaboration, dominated by intrinsic motivations such as passion, interest, and a desire to contribute something to the world. This emerging view of work on Wikipedia is captured by Himanen's notion of The Hacker Ethic. Finally, I explore how qualitative and experimental findings can speak to each other, and discuss some methodological challenges and best practices for combining experimental and qualitative methods. I argue that triangulating qualitative and experimental results in the context of this study facilitates: (1) lending detail and nuance to our understanding of complex attitudes such as social responsibility, and (2) improving the ecological validity of experimental findings by vetting assumptions about competence and social roles/responsibilities in a real-world context. 0 1
The value of geographic wikis R. Priedhorsky University of Minnesota English 2010 This thesis responds to the dual rising trends of geographic content and open content, where the core value of an information system is derived from the work of users. We define the essential properties of an emerging technology, the geographic wiki or geowiki, as well as two variations we invented: the computational geowiki, where user wiki input feeds an algorithm, and the personalized geowiki, where the system provides a personalized interpretation. We focus on two systems to develop these ideas. First, Cyclopath, a research geowiki we founded, serves the bicycle navigation needs of cyclists. We also present analysis in the context of Wikipedia, the well-known and highly successful wiki encyclopedia, using its size and maturity to draw lessons for smaller, younger systems which are far more numerous but hope to grow. We ask three questions with respect to this new technology. First, can it be built? Yes. This thesis describes the design and implementation of Cyclopath, which has grown to be a production system with thousands of users. Second, is it useful? Yes. We identified a representative geographic community, bicyclists, and they both tell us that the information in the Cyclopath geowiki is useful and show us by using the system in great numbers. We also present new ways to measure value in wikis, introducing new techniques for doing so from the perspective of information consumers. In particular, user work in Cyclopath has shortened the average route by 1 km. Also, we present techniques for obtaining more contributions (familiarity matters - sometimes - and users do work beyond what they are asked to) and algorithms for increasing the value of geowiki content by personalizing it, showing that traditional rating prediction algorithms (collaborative filtering) are not effective but simple algorithms based on clustering are. Finally, who cares? Many people. There are numerous communities with great interest in geographic information but limited, incomplete, or awkward access because the relevant knowledge is distributed among members of the community and otherwise unavailable. As our results demonstrate, geowikis are an effective way of gathering and disseminating geographic information, more so than previous techniques. Thus, this research has broad value. 0 0
Wikitology: A novel hybrid knowledge base derived from wikipedia Z. Syed University of Maryland, Baltimore County English 2010 World knowledge may be available in different forms such as relational databases, triple stores, link graphs, meta-data and free text. Human minds are capable of understanding and reasoning over knowledge represented in different ways and are influenced by different social, contextual and environmental factors. By following a similar model, we have integrated a variety of knowledge sources in a novel way to produce a single hybrid knowledge base i.e., Wikitology, enabling applications to better access and exploit knowledge hidden in different forms. Wikipedia proves to be an invaluable resource for generating a hybrid knowledge base due to the availability and interlinking of structured, semi-structured and un-structured encyclopedic information. However, Wikipedia is designed in a way that facilitates human understanding and contribution by providing interlinking of articles and categories for better browsing and search of information, making the content easily understandable to humans but requiring intelligent approaches for being exploited by applications directly. Research projects like Cyc [61] have resulted in the development of a complex broad coverage knowledge base, however, relatively few applications have been built that really exploit it. In contrast, the design and development of Wikitology {KB} has been incremental and has been driven and guided by a variety of applications and approaches that exploit the knowledge available in Wikipedia in different ways. This evolution has resulted in the development of a hybrid knowledge base that not only incorporates and integrates a variety of knowledge resources but also a variety of data structures, and exposes the knowledge hidden in different forms to applications through a single integrated query interface. We demonstrate the value of the derived knowledge base by developing problem specific intelligent approaches that exploit Wikitology for a diverse set of use cases, namely, document concept prediction, cross document co-reference resolution defined as a task in Automatic Content Extraction {(ACE)} [1], Entity Linking to {KB} entities defined as a part of Text Analysis Conference - Knowledge Base Population Track 2009 [65] and interpreting tables [94]. These use cases directly serve to evaluate the utility of the knowledge base for different applications and also demonstrate how the knowledge base could be exploited in different ways. Based on our work we have also developed a Wikitology {API} that applications can use to exploit this unique hybrid knowledge resource for solving real world problems. The different use cases that exploit Wikitology for solving real world problems also contribute to enriching the knowledge base automatically. The document concept prediction approach can predict inter-article and category-links for new Wikipedia articles. Cross document co-reference resolution and entity linking provide a way for specifically linking entity mentions in Wikipedia articles or external articles to the entity articles in Wikipedia and also help in suggesting redirects. In addition to that we have also developed specific approaches aimed at automatically enriching the Wikitology {KB} by unsupervised discovery of ontology elements using the inter-article links, generating disambiguation trees for entities and estimating the page rank of Wikipedia concepts to serve as a measure of popularity. The set of approaches combined together can contribute to a number of steps in a broader unified framework for automatically adding new concepts to the Wikitology knowledge base. 0 0
Textual curators and writing machines: authorial agency in encyclopedias, print to digital Krista A. Kennedy Agency
Authorship
Cyclopaedia
Encyclopedia
Intellectual Property
Wikipedia
Rhetoric and Scientific and Technical Communication
English July 2009 Wikipedia is often discussed as the first of its kind: the first massively collaborative, Web-based encyclopedia that belongs to the public domain. While it’s true that wiki technology enables large-scale, distributed collaborations in revolutionary ways, the concept of a collaborative encyclopedia is not new, and neither is the idea that private ownership might not apply to such documents. More than 275 years ago, in the preface to the 1728 edition of his Cyclopædia, Ephraim Chambers mused on the intensely collaborative nature of the volumes he was about to publish. His thoughts were remarkably similar to contemporary intellectual property arguments for Wikipedia, and while the composition processes involved in producing these texts are influenced by the available technologies, they are also unexpectedly similar. This dissertation examines issues of authorial agency in these two texts and shows that the “Author Construct” is not static across eras, genres, or textual technologies. In contrast to traditional considerations of the poetic author, the encyclopedic author demonstrates a different form of authorial agency that operates within strict genre conventions and does not place a premium on originality. This and related variations challenge contemporary ideas concerning the divide between print and digital authorship as well as the notion that new media intellectual property arguments are without historical precedent. 25 0
Nos bastidores da wikipédia lusófona: percalços e conquistas de um projeto de escrita coletiva online Telma Sueli
Pinto Johnson
Wikipedia
Coletiva online
Redes sociais online
Interações sociais
Wikipédia Lusófona
UFMG Portuguese 29 June 2009 Este estudo aborda os fatores que explicam o surgimento e determinam a sustentação do fenômeno da escrita coletiva voluntária na rede social que opera nos bastidores da enciclopédia online Wikipédia Lusófona, com vistas a contribuir com o campo de pesquisa das redes sociais mediadas por computador compreendidas aqui como redes de comunicação. O trabalho, de natureza qualitativa, resgatou e atualizou contribuições de conceitos clássicos da sociologia, da psicologia social e da antropologia para a compreensão dos processos de interação social. A perspectiva pragmática de que os indivíduos são socialmente constituídos e que o contexto social dentro do qual isso ocorre é um complexo de práticas situadas nos levou a fazer certas opções teóricas e metodológicas. Uma das opções foi adotar uma via alternativa aos paradigmas do individualismo metodológico e do holismo social, combinando conceitos teóricos do interacionismo simbólico e do sistema de dádiva moderno; outra opção foi criar um modelo de etnografia virtual condizente com os princípios metodológicos dapesquisa naturalística, para traçar uma tipologia dos processos de interação social na Wikipédia Lusófona. A pesquisa empírica, realizada no período de 2006 a 2008, resultou em dois corpus de dados coletados via a triangulação de métodos. O primeiro corpus resultou daobservação sistemática, coleta de dados e análise da forma de organização social e dos processos de interação social nas páginas da Wikipédia Lusófona. Esse primeiro corpus possibilitou a fase posterior da pesquisa, que envolveu a realização de entrevistas individuais por e-mail com a participação de 26 wikipedistas brasileiros e portugueses registrados no projeto. A análise dos resultados, à luz dos referenciais teóricos, gerou uma tipologia da complexidade das formas sociais que essa rede dinâmica, mutável e em constante devir manifesta. Os processos de interação social mostram que a cooperação assume uma forma sociológica de primeira ordem, ocorrendo no registro da incondicionalidade do vínculo associativo, e caracterizada como uma dádiva contemporânea altruística a estranhos, mediada por computador. A colaboração se apresenta como uma forma sociológica de segunda ordem, subseqüente e mais complexa, que transcende as características da cooperação e revela a presença da dádiva agonística em formas de interação como o conflito, a competição e a disputa. As interações sob a forma de colaboração ocorrem no registro da condicionalidade, dos embates, das negociações, que tanto resultam na coesão da rede social como também em afastamentos temporários ou definitivos do projeto.
This study broaches the factors that explain the appearance and determine the sustentation of the phenomenon of the voluntary collective writing on the social network that operates behind the scenes of the encyclopedia online Wikipedia Lusofona, with a vision to contribute to the field of research of the social networks mediated by computer understood here as communication networks. The work, from a qualitative perspective, recovered and updated contributions of classic concepts of sociology, social psychology and anthropology for the understanding of the processes of social interaction. The pragmatic perspective that individuals are socially constituted and that the social context within that of which it occurs is a complex of situated practices led us to choose certain theoretical and methodological options. One of the options was to adopt an alternative way to the paradigms of methodological individualism and social wholism, combining theoretical concepts of symbolic interactionism and the system of modern gift; another option was to create a model of suitable virtual ethnography with the methodological principles of naturalistic research, to trace a typology of the processes of social interaction in Wikipedia Lusofona. The empirical research, carried out during the period of 2006 to 2008, resulted in two bodies of data collected by way of a triangulation of methods. The first body resulted from the systematic observation, collection of data and analysis of the form of social organization and theprocesses of social interaction on the pages of Wikipedia Lusofona. This first body made possible the posterior phase of the research, which involved the execution of individual interviews by email with the participation of 26 registered Brazilian and Portuguese wikipedistas on the project. The analysis of the results, under the light of the theoreticalreferences, generated a typology of the complexity of the social forms that this dynamic, in constant transformation and mutable network manifests. The processes of social interaction show that cooperation assumes a sociological form of the first order, occurring in the register of unconditionality of the associative bond, and characterized as an altruistic contemporary gift to strangers, mediated by computer. The collaboration presents itself as a sociological form of the second order, subsequent and more complex, that transcends the characteristics of cooperation and reveals the presence of agonistic gift in forms of interaction like conflict, competition and dispute. The interactions under the form of collaboration occur in the register of conditionality, clashes, negotiations, that often result in the cohesion of the social network as well as in temporary or definitive withdrawals from the project.
0 0
Contextual retrieval of single Wikipedia articles to support the reading of academic abstracts Christopher Jordan Dalhousie University (Canada) English 2009 Google style search engines are currently some of the most popular tools that people use when they are looking for information. There are a variety of reasons that people can have for conducting a search, although, these reasons can generally be distilled down to users being engaged in a task and developing an information need that impedes them from completing that task at a level which is satisfactory to them. The Google style search engine, however, is not always the most appropriate tool for every user task. In this thesis, our approach to search differs from the traditional search engine as we focus on providing support to users who are reading academic abstracts. When people do not understand a passage in the abstract they are reading, they often look for more detailed information or a definition. Presenting them with a list of possibly relevant search results, as a Google style search would, may not immediately meet this information need. In the case of reading, it is logical to hypothesize that userswould prefer to receive a single document containing the information that they need. Developed in this thesis are retrieval algorithms that use the abstract being read along with the passage that the user is interested in to retrieve a single highly related article from Wikipedia. The top performing algorithm from the experiments conducted in this thesis is able to retrieve an appropriate article 77\% of the time. This algorithm was deployed in a prototype reading support tool. {LiteraryMark,} in order to investigate the usefulness of such a tool. The results from the user experiment conducted in this thesis indicate that {LiteraryMark} is able to significantly improve the understanding and confidence levels of people reading abstracts. 0 0
Exploiting external/domain knowledge to enhance traditional text mining using graph-based methods X. Zhang Drexel University, Pennsylvania English 2009 Finding the best way to utilize external/domain knowledge to enhance traditional text mining has been a challenging task. The difficulty centers on the lack of means in representing a document with external/domain knowledge integrated. Graphs are powerful and versatile tools, useful in various subfields of science and engineering for their simple illustration of complicated problems. However, the graph-based approach on knowledge representation and discovery remains relatively unexplored. In this thesis, I propose a graph-based text mining system to incorporate semantic knowledge, document section knowledge, document linkage knowledge, and document category knowledge into the tasks of text clustering and topic analysis. I design a novel term-level graph knowledge representation and a graph-based clustering algorithm to incorporate semantic and document section knowledge for biomedical literature clustering and topic analysis. I present a Markov Random Field {(MRF)} with a Relaxation Labeling {(RL)} algorithm to incorporate document linkage knowledge. I evaluate different types of linkage among documents, including explicit linkage such as hyperlink and citation link, implicit linkage such as coauthor link and co-citation link, and pseudo linkage such as similarity link. I develop a novel semantic-based method to integrate Wikipedia concepts and categories as external knowledge into traditional document clustering. In order to support these new approaches, I develop two automated algorithms to extract multiword phrases and ontological concepts, respectively. The evaluations of news collection, web dataset, and biomedical literature prove the effectiveness of the proposed methods. In the experiment of document clustering, the proposed term-level graph-based method not only outperforms the baseline k-means algorithm in all configurations but also is superior in terms of efficiency. The {MRF-based} algorithm significantly improves spherical k-means and model-based k-means clustering on the datasets containing explicit or implicit linkage; the Wikipedia knowledge-based clustering also improves the document-content-only-based clustering. On the task of topic analysis, the proposed graph presentation, sub graph detection, and graph ranking algorithm can effectively identify corpus-level topic terms and cluster-level topic terms. 0 0
Learning in public: Information literacy and participatory media A. Forte Georgia Institute of Technology, Georgia English 2009 0 0
Scaffolding critical thinking in Wikibook creation as a learning task Nari Kim Education
Critical thinking
Saffolding
Web 2.0
Wikibooks
Learning
English 2009 The purpose of this study was to investigate how to use wikibooks, which emerged through informal learning contexts of Web 2.0 technologies, as a scaffolding tool to improve critical thinking skills in formal learning contexts. Two research questions examined the degrees of participation and critical thinking in wikibook creation under instructional guidance. This study was executed as a mixed-method study incorporating multiple-case study and computer-mediated discourse analysis. Two cases of creating a wikibook as part of a scaffolded learning task were selected: (a) POLT, an enhanced scaffolding wikibook project, and (b) WELT, a minimal scaffolding wikibook project. Results showed that the use of enhanced scaffolds to promote critical thinking were an important factor in wikibook creation. In terms of online participation, the enhanced scaffolding POLT case displayed more expert-like writing patterns, which reduced time and effort related to technical difficulties of wikibooks. The minimal scaffolding WELT case showed novice-like writing patterns, indicating a need for more trial and error on the part of the students when figuring out how to create a wikibook. In addition, the enhanced scaffolding case presented higher levels of critical thinking skills regarding the ratios of the analysis units. More interactions with the instructor in the POLT may have enriched the reflections of individuals, particularly novices in academic writing, as well as helped them adopt new ways of thinking after receiving the instructor's scaffolding and perspective as a mentor. Interestingly, however, more active peer editing (an optional task of the course) was observed in the minimal scaffolding WELT case. During participant interviews, it was revealed that the quantity and quality of peer wikibook chapter editing behaviors, after removing the scaffoldings, were related to several other factors, including (a) motivation for taking the course, (b) understanding the culture of wiki-based communities, and (c) prior knowledge about and experience with the topics of the chapters as well as general academic knowledge. 12 0
Textual curators and writing machines: Authorial agency in encyclopedias, print to digital K. Kennedy University of Minnesota English 2009 Wikipedia is often discussed as the first of its kind: the first massively collaborative, Web-based encyclopedia that belongs to the public domain. While it's true that wiki technology enables large-scale, distributed collaborations in revolutionary ways, the concept of a collaborative encyclopedia is not new, and neither is the idea that private ownership might not apply to such documents. More than 275 years ago, in the preface to the 1728 edition of his Cyclop?dia , Ephraim Chambers mused on the intensely collaborative nature of the volumes he was about to publish. His thoughts were remarkably similar to contemporary intellectual property arguments for Wikipedia , and while the composition processes involved in producing these texts are influenced by the available technologies, they are also unexpectedly similar. This dissertation examines issues of authorial agency in these two texts and shows that the {Author} Construct" is not static across eras genres or textual technologies. In contrast to traditional considerations of the poetic author the encyclopedic author demonstrates a different form of authorial agency that operates within strict genre conventions and does not place a premium on originality. This and related variations challenge contemporary ideas concerning the divide between print and digital authorship as well as the notion that new media intellectual property arguments are without historical precedent." 0 0
The value of everything: Ranking and association with encyclopedic knowledge K. Coursey University of North Texas English 2009 This dissertation describes {WikiRank,} an unsupervised method of assigning relative values to elements of a broad coverage encyclopedic information source in order to identify those entries that may be relevant to a given piece of text. The valuation given to an entry is based not on textual similarity but instead on the links that associate entries, and an estimation of the expected frequency of visitation that would be given to each entry based on those associations in context. This estimation of relative frequency of visitation is embodied in modifications to the random walk interpretation of the {PageRank} algorithm. {WikiRank} is an effective algorithm to support natural language processing applications. It is shown to exceed the performance of previous machine learning algorithms for the task of automatic topic identification, providing results comparable to that of human annotators. Second, {WikiRank} is found useful for the task of recognizing text-based paraphrases on a semantic level, by comparing the distribution of attention generated by two pieces of text using the encyclopedic resource as a common reference. Finally, {WikiRank} is shown to have the ability to use its base of encyclopedic knowledge to recognize terms from different ontologies as describing the same thing, and thus allowing for the automatic generation of mapping links between ontologies. The conclusion of this thesis is that the knowledge access heuristic" is valuable and that a ranking process based on a large encyclopedic resource can form the basis for an extendable general purpose mechanism capable of identifying relevant concepts by association which in turn can be effectively utilized for enumeration and comparison at a semantic level." 0 0
Toward another rhetoric: Web 2.0, Levinas, and taking responsibility for response ability M. Santos Purdue University, Indiana English 2009 This dissertation explores the relationship between public considerations of the impact of contemporary dynamic technologies and the metaphysical ethics of Emmanuel Levinas. Both share an interest in interactivity, plurality, transience, and risk. This shared interest rejects the fundamental values of literacy and print identified by media theorists such as Walter J Ong, Eric Havelock, and Marshall {McLuhan--autonomy,} singularity, permanence, and security. The values of these mediums deeply impacted the development of Platonic Idealism and the Modern Enlightenment. My concluding argument suggests that, in the wake of these new mediums, the discipline of rhetoric and composition, in addition to the entire research University that houses it, should pay attention to how digital communities such as Wikipedia balance the Modern desire for ontological knowledge alongside the postmodern and digital emphasis on ethics. Such a balancing suggests that the primary values of literacy and print, and the institutions they helped to engender, are not ideally suited for a digital world. 0 0
Wikipedia: A Quantitative Analysis Felipe Ortega Universidad Rey Juan Carlos, Spain English 2009 In this doctoral thesis, we undertake a quantitative analysis of the top-ten language editions of Wikipedia, from different perspectives. Our main goal has been to trace the evolution in time of key descriptive and organizational parameters of Wikipedia and its community of authors. The analysis has focused on logged authors (those editors who created a personal account to participate in the project). Among the distinct metrics included, we can ?nd the monthly evolution of general metrics (number of revisions, active editors, active pages); the distribution of pages and its length, the evolution of participation in discussion pages. We also present a detailed analysis of the inner social structure and strati?cation of the Wikipedia community of logged authors, ?tting appropriate distributions to the most relevant metrics. We also examine the inequality level of contributions from logged authors, showing that there exists a core of very active authors who undertake most of the editorial work. Regarding articles, the inequality analysis also shows that there exists a reduced group of popular articles, though the distribution of revisions is not as skewed as in the previous case. The analysis continues with an in-depth demographic study of the community of authors, focusing on the evolution of the core of very active contributors (applying a statistical technique known as survival analysis). We also explore some basic metrics to analyze the quality of Wikipedia articles and the trustworthiness level of individual authors. This work concludes with an extended analysis of the evolution of the most in?uential parameters and metrics previously presented. Based on these metrics, we infer important conclusions about the future sustainability of Wikipedia. According to these results, the Wikipedia community of authors has ceased to grow, remaining stable since Summer 2006 until the end of 2007. As a result, the monthly number of revisions has remained stable over the same period, restricting the number of articles that can be reviewed by the community. On the other side, whilst the number of revisions in talk pages has stabilized over the same period, as well, the number of active talk pages follows a steady growing rate, for all versions. This suggests that the community of authors is shifting its focus to broaden the coverage of discussion pages, which has a direct impact in the ?nal quality of content, as previous research works has shown. Regarding the inner social structure of the Wikipedia community of logged authors, we ?nd Pareto-like distributions that ?t all relevant metrics pertaining authors (number of revisions per author, number of different articles edited per author), while measurements on articles (number of revisions per article, number of different authors per article) follow lognormal shapes. The analysis of the inequality level of revisions performed by authors, and revisions received by arti- cles shows highly unequal distributions. The results of our survival analysis on Wikipedia authors presents very high mortality percentages on young authors, revealing an endemic problem of Wikipedias to keep young editors on collaborating with the project for a long period of time. In the same way, from our survival analysis we obtain that the mean lifetime of Wikipedia authors in the core (until they abandon the group of top editors) is situated between 200 and 400 days, for all versions, while the median value is lower than 120 days in all cases. Moreover the analysis of the monthly number of births and deaths in the community of logged authors reveals that the cause of the shift in the monthly trend of active authors is produced by a higher number of deaths from Summer 2006 in all versions, surpassing the monthly number of births from then on. The analysis of the inequality level of contributions over time, and the evolution of additional key features identi?ed in this thesis, reveals a worrying trend towards progressive increase of the effort spent by core authors, as time elapses. This trend may eventually cause that these authors will reach their upper limit in the number of revisions they can perform each month, thus starting a decreasing trend in the number of monthly revisions, and an overall recession of the content creation and reviewing process in Wikipedia. To prevent this probable future scenario, the number of monthly new editors should be improved again, perhaps through the adoption of speci?c policies and campaigns for attracting new editors to Wikipedia, and recover older top- contributors again. Finally, another important contribution for the research community is {WikiXRay,} the soft- ware tool we have developed to perform the statistical analyses included in this thesis. This tool completely automates the process of retrieving the database dumps from the Wikimedia public repositories, process them to obtain key metrics and descriptive parameters, and load them in a local database, ready to be used in empirical analyses. As far as we know, this is the ?rst research work implementing a comparative analysis, from an quantitative point of view, of the top-ten language editions of Wikipedia, presenting results from many different scienti?c perspectives. Therefore, we expect that this contribution will help the scienti?c community to enhance their understanding of the rich, complex and fascinating work- ing mechanisms and behavioral patterns of the Wikipedia project and its community of authors. Likewise, we hope that {WikiXRay} will facilitate the hard task of developing empirical analyses on any language version of the encyclopedia, boosting in this way the number of comparative studies like this one in many other scienti?c disciplines. 0 8
Dynamics of platform-based markets F. Zhu Harvard University, Massachusetts English 2008 Platform-based markets are prevalent in today's economy. Understanding the driver of platform success is of critical importance for platform providers. In this dissertation, I first develop a dynamic model to characterize conditions under which different factors drive the success of a platform, and then use the theoretical framework to analyze market-level data from the video game industry. I find that game players' marginal utility decreases rapidly with additional games after the number of games reaches a certain point, and quality is more influential than indirect network effects in driving the success of video game consoles. I also use individual-level data from Chinese Wikipedia to examine contributors' incentives to contribute. I take advantage of China's block of Chinese Wikipedia in mainland China in 2005 as a natural experiment to establish the causal relationship between contributors' incentives to contribute and the number of the beneficiaries of their contributions. I find that while on average contributors' incentives to contribute drop significantly after the block, the contribution levels of those contributors with small collaboration networks do not decrease after the block. In addition, these contributors join Wikipedia significantly earlier than the average contributor. The results suggest that other market factors such as altruism could be more influential than indirect network effects in encouraging user participation in the early stage of Chinese Wikipedia. The overall research casts doubt on the popular belief that indirect network effects are the primary force driving platform success and suggests that in many cases, other market forces could be dominant. Late movers could therefore take over market leaderships by exploiting these market forces. 0 0
In good faith: Wikipedia collaboration and the pursuit of the universal encyclopedia J. Reagle New York University English 2008 0 0
Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing A. Csomai University of North Texas English 2008 This research addresses the problem of automatic keyphrase extraction from large documents and back of the book indexing. The potential benefits of automating this process are far reaching, from improving information retrieval in digital libraries, to saving countless man-hours by helping professional indexers creating back of the book indexes. The dissertation introduces a new methodology to evaluate automated systems, which allows for a detailed, comparative analysis of several techniques for keyphrase extraction. We introduce and evaluate both supervised and unsupervised techniques, designed to balance the resource requirements of an automated system and the best achievable performance. Additionally, a number of novel features are proposed, including a statistical informativeness measure based on chi statistics; an encyclopedic feature that taps into the vast knowledge base of Wikipedia to establish the likelihood of a phrase referring to an informative concept; and a linguistic feature based on sophisticated semantic analysis of the text using current theories of discourse comprehension. The resulting keyphrase extraction system is shown to outperform the current state of the art in supervised keyphrase extraction by a large margin. Moreover, a fully automated back of the book indexing system based on the keyphrase extraction system was shown to lead to back of the book indexes closely resembling those created by human experts. 0 0
Sparse relational data sets: Issues and an application E. Chu The University of Wisconsin - Madison English 2008 0 0
The effect of applying wikis in an English as a foreign language (EFL) class in Taiwan Yu-ching Chen Education
Wiki
Diffusion of innovations
English as a foreign language
Cooperative learning
Online learning
Taiwan
China
Innovation diffusion
English 2008 Incorporating technology into learning has brought major benefits to learners and has greatly changed higher education. Since there is limited number of experimental research investigating the effectiveness of applying wikis, this study collected experimental data to investigate its effectiveness. The purpose of the study was to examine the effectiveness of applying wikis in terms of students' learning outcomes, to investigate the changes regarding students' attitude towards language learning, to explore the communication channels in wikis that facilitate students' interaction in the e-learning environment as well as students' experience of using wikis.

Results showed that there existed statistically significant difference between the group with and without wikis, which means the group applying wikis performed better in listening and reading abilities. When compared with the non-wiki group, the wiki group had a more favorable attitude towards the class, their English ability improvement, and cooperative learning. Moreover, the students agreed that wikis helped them complete their assignment, they felt comfortable in the wiki environment, and it was easy for them to use wikis.

From the experiences of using wikis shared by the students, they provided recommendations about the interface and the edit functions in the wiki environment. Their interaction with other team members and the course material increased but they expressed that the main interaction was through face-to-face and instant message software. Finally, the wiki environment allowed students to fulfill their role duties, cooperate, negotiate, manage their contribution, and modeling from each other.
20 0
The geographical analog engine: Hybrid numeric and semantic similarity measures for U.S. cities T. Banchuen The Pennsylvania State University English 2008 This dissertation began with the goal to develop a methodology for locating climate change analogs, and quickly turned into a quest for computational means of locating geographical analogs in general. Previous work in geographical analogs either only computed on numeric information, or manually considered qualitative information. Current and emerging technologies, such as electronic document collections, the Internet, and the Semantic Web, make it possible for people and organizations to store millions of books and articles, share them with the world, or even author some themselves. The amount of electronic and online content is expanding at an exponential speed, such that analysts are increasingly overwhelmed by the sheer volumes of accessible information. The dissertation explores techniques from knowledge engineering, artificial intelligence, information sciences, linguistics and cognitive science, and proposes a novel, automatic methodology that computes similarity within online/offline textual information, and graphically and statistically combines the results with those of numeric methods. {U.S.} cities with populations larger than 25,000 people are selected as a test case. Places are evaluated based on their numeric characteristics in the County and City Data Book and qualitative characteristics from Wikipedia entries. The dissertation recommends a way to convert Wikipedia entries into the Web Ontology Language {(OWL)} ontologies, which computer algorithms can read, understand and compute. The dissertation initially experiments with Mitra and Wiederhold's semantic measure to quantify similarity between places in the qualitative space. Many shortfalls are identified, and a series of experimental enhancements are explored. The experiments demonstrate that good semantic measures should employ a comprehensive stop-words list and a complete, but succinct vocabulary. A semantic measure that can recognize synonyms must understand the intended senses of words in a place description. Furthermore, analysts need to be careful with two styles of descriptions: descriptions of places that are (1) created by following a template, or (2) laden with statistical statements can result in falsely high similarity between the places. It is illustrated that scatter plots of numeric similarity scores versus semantic similarity scores can effectively help analysts consider similarity between places in two-space. Analysts can visually observe whether the numeric ranks of places agree with the semantic ranks. The dissertation also shows that the Spearman's rank correlation test and the {Kruskal-Wallis} test of means can provide statistical confirmation for visual observations. The proposed hybrid methodology enables analysts to automatically discover geographical analogs in ways that strictly numeric methods or manual semantic analysis cannot offer. 0 0
Wiki & TGfU: A collaborative approach to understanding games education Helena Baert Physical education
Wiki
Teacher education
Technology
Collaboration
Scaffolding
English 2008 Technology is becoming an integral part of teaching and learning in schools. In recognition of the potential contributions of technology toward learning, this thesis explored the use of a wiki, a collaborative webpage where students are free to add, edit, erase or create content (Leuf & Cunningham, 2001), within physical education teacher education. Using interpretive inquiry (Ellis, 1998) as a methodological framework, this qualitative study investigated the perceptions of a cohort of 28 final year physical education teacher candidates regarding the usefulness of wikis as an instructional tool to enhance learning through an online five-week collaborative group project. The objective of the assignment was for teacher candidates to develop deeper understanding of the Teaching Games for Understanding (TGfU) approach, which creates student-centred games education that links tactics and skills in game settings. The study employed several qualitative research activities including: observing the daily entries on the wiki, document analyses of reflective journals, pre- and post writing samples, and focus group interviews. The information collected identified both enabling and constraining factors this wiki brought to a collaborative undergraduate online project. Data analyses confirmed that the wiki facilitated collaboration among group members, improved writing skills and enhanced deeper understanding through scaffolding of one's own ideas as well as those of others. Findings also showed how the teacher candidates interacted with the content to gain a deeper understanding of the TGfU approach through an emergent design of scaffolds. In their efforts to work collaboratively, the students realized that establishing roles and responsibilities and creating more opportunity for communication were necessary ingredients for learning. To encourage knowledge acquisition, the instructional guidance provided by the teacher was a crucial component of the scaffolding design. In sum, this thesis elaborates on how wikis contributed to the development of an understanding of teaching games. 20 0
Learning for information extraction: From named entity recognition and disambiguation to relation extraction R. Bunescu The University of Texas at Austin English 2007 Information Extraction, the task of locating textual mentions of specific types of entities and their relationships, aims at representing the information contained in text documents in a structured format that is more amenable to applications in data mining, question answering, or the semantic web. The goal of our research is to design information extraction models that obtain improved performance by exploiting types of evidence that have not been explored in previous approaches. Since designing an extraction system through introspection by a domain expert is a laborious and time consuming process, the focus of this thesis will be on methods that automatically induce an extraction model by training on a dataset of manually labeled examples. Named Entity Recognition is an information extraction task that is concerned with finding textual mentions of entities that belong to a predefined set of categories. We approach this task as a phrase classification problem, in which candidate phrases from the same document are collectively classified. Global correlations between candidate entities are captured in a model built using the expressive framework of Relational Markov Networks. Additionally, we propose a novel tractable approach to phrase classification for named entity recognition based on a special Junction Tree representation. Classifying entity mentions into a predefined set of categories achieves only a partial disambiguation of the names. This is further refined in the task of Named Entity Disambiguation, where names need to be linked to their actual denotations. In our research, we use Wikipedia as a repository of named entities and propose a ranking approach to disambiguation that exploits learned correlations between words from the name context and categories from the Wikipedia taxonomy. Relation Extraction refers to finding relevant relationships between entities mentioned in text documents. Our approaches to this information extraction task differ in the type and the amount of supervision required. We first propose two relation extraction methods that are trained on documents in which sentences are manually annotated for the required relationships. In the first method, the extraction patterns correspond to sequences of words and word classes anchored at two entity names occurring in the same sentence. These are used as implicit features in a generalized subsequence kernel, with weights computed through training of Support Vector Machines. In the second approach, the implicit extraction features are focused on the shortest path between the two entities in the word-word dependency graph of the sentence. Finally, in a significant departure from previous learning approaches to relation extraction, we propose reducing the amount of required supervision to only a handful of pairs of entities known to exhibit or not exhibit the desired relationship. Each pair is associated with a bag of sentences extracted automatically from a very large corpus. We extend the subsequence kernel to handle this weaker form of supervision, and describe a method for weighting features in order to focus on those correlated with the target relation rather than with the individual entities. The resulting Multiple Instance Learning approach offers a competitive alternative to previous relation extraction methods, at a significantly reduced cost in human supervision. 0 0
Digital archives and the turn to design J. Purdy University of Illinois at Urbana-Champaign English 2006 Much existing archival work productively examines the contents of archives and their role in historical research; this dissertation offers a fresh perspective on archives by adding to studies of archival texts research on archival technologies. This dissertation argues that digital archives are technologies that shape writing and research practices through their design. Rather than being neutral spaces, they are built on claims about what constitutes appropriate writing and research behaviors in the new media age. In their designs, these technologies situate print as the standard by which to evaluate their effectiveness, illustrating anxiety about the reliability and integrity of the digital. They, moreover, consistently privilege linguistic text, a challenge to embracing multimodality as a frame for composing. While the idea that archives are dynamic spaces is not new, much of the anxiety regarding digital archives continues to be that they do not fix texts---and that singular, stable processes for engaging with them are not knowable. Yet rather than distrust digital archives, I argue for viewing them as spaces that can help us understand composing and researching as dynamic, multimodal processes. The argument proceeds through case studies of three different digital archive technologies: digital document repositories (web sites that store and provide access to archival collections online), wikis (dynamic, collaboratively authored web sites that anyone can add to or change), and plagiarism detection services (web sites that test uploaded papers to determine if they include language copied directly from other sources). Specifically, my primary objects of analysis are {JSTOR} {(Journal} Storage, the Scholarly Journal Archive), Wikipedia, and Turnitin, respectively. Because technologies are both discursive and material constructions, I study the discourse surrounding and the functionality of each technology using a design approach that builds on Gunther Kress' notion of design but extends it beyond the visual to the structural. As increasing numbers of texts take digital form, the problems and promise of digital archives will demand thoughtful responses. The ways in which these spaces are designed will determine the kinds of texts that will be produced and valued in the future. 0 0
Essays analyzing blogs and Wikipedia Mohammad M. Rahman The University of Kansas English 2006 0 0
Feature Generation for Textual Information Retrieval Using World Knowledge Evgeniy Gabrilovich Technion ?Israel Institute of Technology English 2006 Imagine an automatic news filtering system that tracks company news. Given the news item {FDA} approves ciprofloxacin for victims of anthrax inhalation" how can the system know that the drug mentioned is an antibiotic produced by Bayer? Or consider an information professional searching for data on {RFID} technology - how can a computer understand that the item {"Wal-Mart} supply chain goes real time" is relevant for the search? Algorithms we present can do just that. 0 0
Helping hands: Design for member-maintained online communities Daniel Regis Cosley University of Minnesota English 2006 Online communities provide millions of people every day with information, companionship, support, and fun. These communities need regular maintenance to function. Tasks such as welcoming new members, reviewing contributions, and building community-specific databases typically fall to a few dedicated members. Concentrating responsibility in the hands of a few valuable leaders makes communities vulnerable to leaders' leaving and limits communities' ability to grow and provide value. We study the design of member-maintained online communities, systems where many members help perform upkeep. A key design challenge is motivating members to contribute toward maintenance. Social science theories help to explain why people contribute to groups. We use these theories to design two general mechanisms for increasing people's motivation to contribute. The collective effort model from social psychology suggests people are more likely to contribute to a group if they believe their contributions matter. Editorial review can foster this belief by promoting good content and suppressing bad content. We build review systems that involve the whole community, where review is performed by peers, experts, or no one. Peer review performs about as well as expert review in both motivating contributions and providing effective review, but no review does very poorly. We also explore whether contributions must be reviewed before being made available to the community. Mathematical models suggest that making contributions available right away increases value more quickly, and does just as well in the long run, as requiring prior review. These models can inform the design of review systems. Public goods theory from economics suggests people will contribute more to group resources if the cost of contributing drops. We use intelligent task routing---matching people with tasks they are likely to do---to reduce contribution costs. We develop a number of generally useful task routing algorithms. Experiments in a movie database and in Wikipedia show these algorithms are very effective at increasing people's motivation to contribute. By using theory to support our designs, testing them in multiple domains, and distilling our results into usable artifacts such as guidelines, models, and algorithms, we hope to help designers build better systems and better communities. 0 0
Improve text retrieval effectiveness and robustness Shuang Liu University of Illinois at Chicago English 2006 Retrieval effectiveness and robustness are two of the most important criteria of text retrieval. Over the past decades, numerous techniques have been introduced to enhance text retrieval performance including those using phrases, passages, general dictionaries such as {WordNet,} word sense disambiguation, automatic query expansion, pseudo-relevance feedback, and external sources assisted feedback. This {Ph.D.} dissertation study focuses on improving the text retrieval effectiveness and robustness by extending existing retrieval model and providing new techniques which include: {(1)?Designing} and implementing a new retrieval model. {(2)?Utilizing} concept in text retrieval. {(3)?Designing} and implementing a highly accurate word sense disambiguation algorithm and incorporating it to our information retrieval system. {(4)?Expanding} queries by using multiple dictionaries such as {WordNet} and Wikipedia. {(5)?Employing} different pseudo relevance feedback into the retrieval system including local, web-assisted, and Wikipedia-assisted feedback and adopting semantic information to pseudo relevance feedback. In this {Ph.D.} study, our design decisions are verified through experiments in the retrieval system. Results are evaluated by standard evaluation metrics: precision, recall, mean average precision (MAP), and geometric mean average precision (GMAP) 0 0
Sharing knowledge and building communities: A narrative of the formation, development and sustainability of OOPS M. Lin University of Houston, Texas English 2006 0 0

See also[edit]