User generated content

From WikiPapers
Jump to: navigation, search

User generated content is included as keyword or extra keyword in 0 datasets, 0 tools and 58 publications.


There is no datasets for this keyword.


There is no tools for this keyword.


Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Organización del conocimiento en entornos wiki: una experiencia de organización de información sobre lecturas académicas Jesús Tramullas
Ana I. Sánchez
Piedad Garrido-Picazo
Organización del conocimiento: sistemas de información abiertos. Actas del XII Congreso ISKO España y II Congreso ISKO España y Portugal Spanish 2015 This paper reviews the informational behavior of a community of university students during the development of a learning activity with a wiki. Through a case study, analyzes the data available on the wiki, and identifies patterns of creating and organizing content. The wiki study is also done within the information management framework proposed by Rowley. The findings support the conclusion that students apply the principle of economy of effort in their informational behavior, guided by the assessment requirements of that activity, and Rowley's proposal is not suitable for analyzing and evaluating educational processes technologically mediated. 0 0
Accessible online content creation by end users Kuksenok K.
Brooks M.
Mankoff J.
Conference on Human Factors in Computing Systems - Proceedings English 2013 Like most online content, user-generated content (UGC) poses accessibility barriers to users with disabilities. However, the accessibility difficulties pervasive in UGC warrant discussion and analysis distinct from other kinds of online content. Content authors, community culture, and the authoring tool itself all affect UGC accessibility. The choices, resources available, and strategies in use to ensure accessibility are different than for other types of online content. We contribute case studies of two UGC communities with accessible content: Wikipedia, where authors focus on access to visual materials and navigation, and an online health support forum where users moderate the cognitive accessibility of posts. Our data demonstrate real world moderation strategies and illuminate factors affecting success, such as community culture. We conclude with recommended strategies for creating a culture of accessibility around UGC. Copyright 0 0
Boot-strapping language identifiers for short colloquial postings Goldszmidt M.
Najork M.
Paparizos S.
Lecture Notes in Computer Science English 2013 There is tremendous interest in mining the abundant user generated content on the web. Many analysis techniques are language dependent and rely on accurate language identification as a building block. Even though there is already research on language identification, it focused on very 'clean' editorially managed corpora, on a limited number of languages, and on relatively large-sized documents. These are not the characteristics of the content to be found in say, Twitter or Facebook postings, which are short and riddled with vernacular. In this paper, we propose an automated, unsupervised, scalable solution based on publicly available data. To this end we thoroughly evaluate the use of Wikipedia to build language identifiers for a large number of languages (52) and a large corpus and conduct a large scale study of the best-known algorithms for automated language identification, quantifying how accuracy varies in correlation to document size, language (model) profile size and number of languages tested. Then, we show the value in using Wikipedia to train a language identifier directly applicable to Twitter. Finally, we augment the language models and customize them to Twitter by combining our Wikipedia models with location information from tweets. This method provides massive amount of automatically labeled data that act as a bootstrapping mechanism which we empirically show boosts the accuracy of the models. With this work we provide a guide and a publicly available tool [1] to the mining community for language identification on web and social data. 0 0
Characterizing and curating conversation threads: Expansion, focus, volume, re-entry Backstrom L.
Kleinberg J.
Lena Lee
Cristian Danescu-Niculescu-Mizil
WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining English 2013 Discussion threads form a central part of the experience on many Web sites, including social networking sites such as Facebook and Google Plus and knowledge creation sites such as Wikipedia. To help users manage the challenge of allocating their attention among the discussions that are relevant to them, there has been a growing need for the algorithmic curation of on-line conversations - - the development of automated methods to select a subset of discussions to present to a user. Here we consider two key sub-problems inherent in conversational curation: length prediction - - predicting the number of comments a discussion thread will receive - - and the novel task of re-entry prediction - - predicting whether a user who has participated in a thread will later contribute another comment to it. The first of these sub-problems arises in estimating how interesting a thread is, in the sense of generating a lot of conversation; the second can help determine whether users should be kept notified of the progress of a thread to which they have already contributed. We develop and evaluate a range of approaches for these tasks, based on an analysis of the network structure and arrival pattern among the participants, as well as a novel dichotomy in the structure of long threads. We find that for both tasks, learning-based approaches using these sources of information. 0 0
Modeling and simulation on collective intelligence in future internet-A study of wikipedia Du S.
Qi J.
Information Technology Journal English 2013 Under the background of Web 2.0, network's socialization generates collective intelligence which can enrich human beings wisdom. However, what is the main factor that influences the performance of this behavior is still in research. In this study, the effect of number of Internet users that is represented by quantity, quality and variety of User-generated Content (UGC) is brought forward. Regarding Wikipedia as a study case, this study uses Agent-based modeling methodology and real data of Wikipedia for about 10 years to establish and simulate the model. The results verify that the size of group is indeed a necessary condition to generate collective intelligence. When the number of participants in Wikipedia reaches about 400000, the quantity of UGC increases exponentially, the quality of UGC reaches a satisfactory level and the variety of UGC can be guaranteed. This insight gives significance to show when mass collaboration will lead to collective intelligence which is an innovation than before. 0 0
Similarities, challenges and opportunities of wikipedia content and open source projects Capiluppi A. Journal of software: Evolution and Process English 2013 Several years of research and evidence have demonstrated that open source software portals often contain a large amount of software projects that simply do not evolve, developed by relatively small communities, struggling to attract a sustained number of contributors. These portals have started to increasingly act as a storage for abandoned projects, and researchers and practitioners should try and point out how to take advantage of such content. Similarly, other online content portals (like Wikipedia) could be harvested for valuable content. In this paper we argue that, even with differences in the requested expertise, many projects reliant on content and contributions by users undergo a similar evolution, and follow similar patterns: when a project fails to attract contributors, it appears to be not evolving, or abandoned. Far from a negative finding, even those projects could provide valuable content that should be harvested and identified based on common characteristics: by using the attributes of 'usefulness' and 'modularity' we isolate valuable content in both Wikipedia pages and open source software projects. Copyright 0 0
Topic familiarity and information skills in online credibility evaluation Teun Lucassen
Muilwijk R.
Noordzij M.L.
Schraagen J.M.
Journal of the American Society for Information Science and Technology English 2013 With the rise of user-generated content, evaluating the credibility of information has become increasingly important. It is already known that various user characteristics influence the way credibility evaluation is performed. Domain experts on the topic at hand primarily focus on semantic features of information (e.g., factual accuracy), whereas novices focus more on surface features (e.g., length of a text). In this study, we further explore two key influences on credibility evaluation: topic familiarity and information skills. Participants with varying expected levels of information skills (i.e., high school students, undergraduates, and postgraduates) evaluated Wikipedia articles of varying quality on familiar and unfamiliar topics while thinking aloud. When familiar with the topic, participants indeed focused primarily on semantic features of the information, whereas participants unfamiliar with the topic paid more attention to surface features. The utilization of surface features increased with information skills. Moreover, participants with better information skills calibrated their trust against the quality of the information, whereas trust of participants with poorer information skills did not. This study confirms the enabling character of domain expertise and information skills in credibility evaluation as predicted by the updated 3S-model of credibility evaluation. 0 0
Value Production in a Collaborative Environment: Sociophysical Studies of Wikipedia Taha Yasseri
Kertesz J.
Journal of Statistical Physics English 2013 We review some recent endeavors and add some new results to characterize and understand underlying mechanisms in Wikipedia (WP), the paradigmatic example of collaborative value production. We analyzed the statistics of editorial activity in different languages and observed typical circadian and weekly patterns, which enabled us to estimate the geographical origins of contributions to WPs in languages spoken in several time zones. Using a recently introduced measure we showed that the editorial activities have intrinsic dependencies in the burstiness of events. A comparison of the English and Simple English WPs revealed important aspects of language complexity and showed how peer cooperation solved the task of enhancing readability. One of our focus issues was characterizing the conflicts or edit wars in WPs, which helped us to automatically filter out controversial pages. When studying the temporal evolution of the controversiality of such pages we identified typical patterns and classified conflicts accordingly. Our quantitative analysis provides the basis of modeling conflicts and their resolution in collaborative environments and contribute to the understanding of this issue, which becomes increasingly important with the development of information communication technology. 0 0
Which came first? Contribution dynamics in online production communities Kane G.C.
Ransbotham S.
International Conference on Information Systems (ICIS 2013): Reshaping Society Through Information Systems Design English 2013 While considerable research investigates collaboration in online production communities, particularly how and why people join these communities, little research considers the dynamics of the collaborative behavior. This paper explores one such dynamic, the relationship between viewing and contributing. Building on established theories of community involvement, this paper argues that a recursive relationship exists, resulting in a mutually reinforcing cycle where more contributors lead to more viewers and, in turn, more viewers lead to more contributors. We also analyze the effect of time and anonymity within this dynamic relationship. This paper offers guidance for research into online production communities that builds on the large behavioral data these communities generate. © (2013) by the AIS/ICIS Administrative Office. All rights reserved. 0 0
A study of social behavior in collaborative user generated services Yao P.
Hu Z.
Zhao Z.
Crespi N.
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, ICUIMC'12 English 2012 User-generated content has become more and more popular. The success of collaborative content creation such as Wikipedia shows the level of user's accomplishments in knowledge sharing and socialization. In this paper we extend this research in the service domain, to explore users' social behavior in Collaborative User-Generated Services (Co-UGS). We create a model which is derived from a real social network with its behavior being similar to that of Co-UGS. The centrality approach of social network analysis is used to analyze Co-UGS simulation on this model. Three Co-UGS network actors are identified to distinguish users according to their reactions to a service, i.e. ignoring users, sharing users and co-creating users. Moreover, six hypotheses are proposed to keep the Co-UGS simulation. The results show that the Co-UGS network constructed by the sharing and co-creating users is a connected group superimposed on the basis of the social network of users. In addition, the feasibility of this simulation method is demonstrated along with the validity of applying social network analysis to the study of users' social behavior in Co-UGS. 0 0
CoSyne: Synchronizing multilingual wiki content Bronner A.
Matteo Negri
Yashar Mehdad
Angela Fahrni
Christof Monz
WikiSym 2012 English 2012 CoSyne is a content synchronization system for assisting users and organizations involved in the maintenance of multilingual wikis. The system allows users to explore the diversity of multilingual content using a monolingual view. It provides suggestions for content modification based on additional or more specific information found in other language versions, and enables seamless integration of automatically translated sentences while giving users the flexibility to edit, correct and control eventual changes to the wiki page. To support these tasks, CoSyne employs state-of-the-art machine translation and natural language processing techniques. 0 0
Governance of open content creation: A conceptualization and analysis of control and guiding mechanisms in the open content domain Schroeder A.
Christian Wagner
Journal of the American Society for Information Science and Technology English 2012 The open content creation process has proven itself to be a powerful and influential way of developing text-based content, as demonstrated by the success of Wikipedia and related sites. Distributed individuals independently edit, revise, or refine content, thereby creating knowledge artifacts of considerable breadth and quality. Our study explores the mechanisms that control and guide the content creation process and develops an understanding of open content governance. The repertory grid method is employed to systematically capture the experiences of individuals involved in the open content creation process and to determine the relative importance of the diverse control and guiding mechanisms. Our findings illustrate the important control and guiding mechanisms and highlight the multifaceted nature of open content governance. A range of governance mechanisms is discussed with regard to the varied levels of formality, the different loci of authority, and the diverse interaction environments involved. Limitations and opportunities for future research are provided. 0 0
In search of the ur-Wikipedia: Universality, similarity, and translation in the Wikipedia inter-language link network Morten Warncke-Wang
Anuradha Uduwage
Zhenhua Dong
John Riedl
WikiSym 2012 English 2012 Wikipedia has become one of the primary encyclopaedic information repositories on the World Wide Web. It started in 2001 with a single edition in the English language and has since expanded to more than 20 million articles in 283 languages. Criss-crossing between the Wikipedias is an inter-language link network, connecting the articles of one edition of Wikipedia to another. We describe characteristics of articles covered by nearly all Wikipedias and those covered by only a single language edition, we use the network to understand how we can judge the similarity between Wikipedias based on concept coverage, and we investigate the flow of translation between a selection of the larger Wikipedias. Our findings indicate that the relationships between Wikipedia editions follow Tobler's first law of geography: similarity decreases with increasing distance. The number of articles in a Wikipedia edition is found to be the strongest predictor of similarity, while language similarity also appears to have an influence. The English Wikipedia edition is by far the primary source of translations. We discuss the impact of these results for Wikipedia as well as user-generated content communities in general. 0 0
Network Analysis of User Generated Content Quality in Wikipedia Myshkin Ingawale
Amitava Dutta
Rahul Roy
Priya Seetharaman
Online Information Review 2012 Social media platforms allow near-unfettered creation and exchange of User Generated Content (UGC). We use Wikipedia, which consists of interconnected user generated articles. Drawing from network science, we examine whether high and low quality UGC in Wikipedia differ in their connectivity structures. Using featured articles as a proxy for high quality, we undertake a network analysis of the revision history of six different language Wikipedias to offer a network-centric explanation for the emergence of quality in UGC. The network structure of interactions between articles and contributors plays an important role in the emergence of quality. Specifically, the analysis reveals that high quality articles cluster in hubs that span structural holes. The analysis does not capture the strength of interactions between articles and contributors. The implication of this limitation is that quality is viewed as a binary variable. Extensions to this research will relate strength of interactions to different levels of quality in user generated content. Practical implications Our findings help harness the ‘wisdom of the crowds’ effectively. Organizations should nurture users and articles at the structural hubs, from an early stage. This can be done through appropriate design of collaborative knowledge systems and development of organizational policies to empower hubs. Originality The network centric perspective on quality in UGC and the use of a dynamic modeling tool are novel. The paper is of value to researchers in the area of social computing and to practitioners implementing and maintaining such platforms in organizations. 0 0
Network characteristics and the value of collaborative user-generated content Ransbotham S.
Kane G.C.
Lurie N.H.
Marketing Science English 2012 User-generated content is increasingly created through the collaborative efforts of multiple individuals. In this paper, we argue that the value of collaborative user-generated content is a function both of the direct efforts of its contributors and of its embeddedness in the content-contributor network that creates it. An analysis of Wikipedia's WikiProject Medicine reveals a curvilinear relationship between the number of distinct contributors to user-generated content and viewership. A two-mode social network analysis demonstrates that the embeddedness of the content in the content-contributor network is positively related to viewership. Specifically, locally central content-characterized by greater intensity of work by contributors to multiple content sources- is associated with increased viewership. Globally central content-characterized by shorter paths to the other collaborative content in the overall network-also generates greater viewership. However, within these overall effects, there is considerable heterogeneity in how network characteristics relate to viewership. In addition, network effects are stronger for newer collaborative user-generated content. These findings have implications for fostering collaborative user-generated content. 0 1
Omnipedia: Bridging the Wikipedia Language Gap Patti Bao
Brent Hecht
Samuel Carton
Mahmood Quaderi
Michael Horn
Darren Gergle
International Conference on Human Factors in Computing Systems English 2012 We present Omnipedia, a system that allows Wikipedia readers to gain insight from up to 25 language editions ofWikipedia simultaneously. Omnipedia highlights the similarities and differences that exist among Wikipedia language editions, and makes salient information that is unique to each language as well as that which is shared more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with a multilingual Wikipedia experience. These include visualizing content in a language-neutral way and aligning data in the face of diverse information organization strategies. We present a study of Omnipedia that characterizes how people interact with information using a multilingual lens. We found that users actively sought information exclusive to unfamiliar language editions and strategically compared how language editions defined concepts. Finally, we briefly discuss how Omnipedia generalizes to other domains facing language barriers. 0 0
Omnipedia: Bridging the Wikipedia language gap Patti Bao
Brent Hecht
Samuel Carton
Mahmood Quaderi
Michael Horn
Darren Gergle
Conference on Human Factors in Computing Systems - Proceedings English 2012 We present Omnipedia, a system that allows Wikipedia readers to gain insight from up to 25 language editions of Wikipedia simultaneously. Omnipedia highlights the similarities and differences that exist among Wikipedia language editions, and makes salient information that is unique to each language as well as that which is shared more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with a multilingual Wikipedia experience. These include visualizing content in a language-neutral way and aligning data in the face of diverse information organization strategies. We present a study of Omnipedia that characterizes how people interact with information using a multilingual lens. We found that users actively sought information exclusive to unfamiliar language editions and strategically compared how language editions defined concepts. Finally, we briefly discuss how Omnipedia generalizes to other domains facing language barriers. Copyright 2012 ACM. 0 0
Patterns of creation and usage of Wikipedia content Capiluppi A.
Duarte Pimentel A.C.
Boldyreff C.
Proceedings of IEEE International Symposium on Web Systems Evolution, WSE English 2012 Wikipedia is the largest online service storing user-generated content. Its pages are open to anyone for addition, deletion and modifications, and the effort of contributors is recorded and can be tracked in time. Although potentially the Wikipedia web content could exhibit unbounded growth, it is still not clear whether the effort of developers and the output generated are actually following patterns of continuous growth. It is also not clear how the users access such content, and if recurring patterns of usage are detectable showing how the Wikipedia content typically is viewed by interested readers. Using the category of Wikipedia as macro-agglomerates, this study reveals that Wikipedia categories face a decreasing growth trend over time, after an initial, exponential phase of development. On the other hand the study demonstrates that the number of views to the pages within the categories follow a linear, unbounded growth. The link between software usefulness and the need for software maintenance over time has been established by Lehman and other; the link betweenWikipedia usage and changes to the content, unlike software, appear to follow a two-phase evolution of production followed by consumption. 0 0
Predicting quality flaws in user-generated content: The case of wikipedia Maik Anderka
Benno Stein
Nedim Lipka
SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval English 2012 The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. We apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. We present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. We argue that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem. We develop a quality flaw model and employ a dedicated machine learning approach to predict Wikipedia's most important quality flaws. Since in the Wikipedia setting the acquisition of significant test data is intricate, we analyze the effects of a biased sample selection. In this regard we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. The flaw prediction performance is evaluated with 10,000 Wikipedia articles that have been tagged with the ten most frequent quality flaws: provided test data with little noise, four flaws can be detected with a precision close to 1. 0 0
Probabilistically ranking web article quality based on evolution patterns Jangwhan Han
Chen K.
Jiang D.
Lecture Notes in Computer Science English 2012 User-generated content (UGC) is created, updated, and maintained by various web users, and its data quality is a major concern to all users. We observe that each Wikipedia page usually goes through a series of revision stages, gradually approaching a relatively steady quality state and that articles of different quality classes exhibit specific evolution patterns. We propose to assess the quality of a number of web articles using Learning Evolution Patterns (LEP). First, each article's revision history is mapped into a state sequence using the Hidden Markov Model (HMM). Second, evolution patterns are mined for each quality class, and each quality class is characterized by a set of quality corpora. Finally, an article's quality is determined probabilistically by comparing the article with the quality corpora. Our experimental results demonstrate that the LEP approach can capture a web article's quality precisely. 0 0
Surveying Wikipedia activity: Collaboration, commercialism, and culture Karkulahti O.
Kangasharju J.
International Conference on Information Networking English 2012 User generated content has grown drastically on the Internet over the last years. In this paper, we take a look at Wikipedia and compare Wikipedia editing activity both against commercial content producers as well as across different cultures. Our results show that commercial news sites have a clear diurnal and weekday-weekend patterns, whereas Wikipedia editing has a clear diurnal pattern, but no discernible weekday-weekend pattern. We studied 4 different Wikipedias from 4 different languages and cultures and found out that the editing behavior is very similar across all of them. 0 0
What Wikipedia Deletes: Characterizing Dangerous Collaborative Content Andrew G. West
Insup Lee
WikiSym English October 2011 Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions which are not simply ``undone -- but *deleted* from revision histories and public view. Such treatment is generally reserved for edits which: (1) present a legal liability to the host (e.g., copyright issues, defamation), or (2) present privacy threats to individuals (i.e., contact information). Herein, we analyze one year of Wikipedia's public deletion log and use brute-force strategies to learn about privately handled redactions. This permits insight about the prevalence of deletion, the reasons that induce it, and the extent of end-user exposure to dangerous content. While Wikipedia's approach is generally quite reactive, we find that copyright issues prove most problematic of those behaviors studied. 0 1
Automatic reputation assessment in Wikipedia Wohner T.
Kohler S.
Ralf Peters
International Conference on Information Systems 2011, ICIS 2011 English 2011 The online encyclopedia Wikipedia is predominantly created by anonymous or pseudonymous authors whose knowledge and motivations are unknown. For that reason there is an uncertainty in terms of their contribution quality. An approach to this problem is provided by automatic reputation systems, which have been becoming a new research branch in the recent years. In previous research, different metrics for automatic reputation assessment have been suggested. Nevertheless, the metrics are evaluated insufficiently and considered isolated only. As a result, the significance of these metrics is quite unclear. In this paper, we compare and assess seven metrics, both originated from the literature and new suggestions. Additionally, we combine these metrics via a discriminant analysis to deduce a significant reputation function. The analysis reveals that our newly suggested metric editing efficiency is particularly effective. We validate our reputation function by means of an analysis of Wikipedia user groups. 0 0
Citizens as database: Conscious ubiquity in data collection Richter K.-F.
Winter S.
Lecture Notes in Computer Science English 2011 Crowd sourcing [1], citzens as sensors [2], user-generated content [3,4], or volunteered geographic information [5] describe a relatively recent phenomenon that points to dramatic changes in our information economy. Users of a system, who often are not trained in the matter at hand, contribute data that they collected without a central authority managing or supervising the data collection process. The individual approaches vary and cover a spectrum from conscious user actions ('volunteered') to passive modes ('citizens as sensors'). Volunteered user-generated content is often used to replace existing commercial or authoritative datasets, for example, Wikipedia as an open encyclopaedia, or OpenStreetMap as an open topographic dataset of the world. Other volunteered content exploits the rapid update cycles of such mechanisms to provide improved services. For example, reports damages related to streets; Google, TomTom and other dataset providers encourage their users to report updates of their spatial data. In some cases, the database itself is the service; for example, Flickr allows users to upload and share photos. At the passive end of the spectrum, data mining methods can be used to further elicit hidden information out of the data. Researchers identified, for example, landmarks defining a town from Flickr photo collections [6], and commercial services track anonymized mobile phone locations to estimate traffic flow and enable real-time route planning. 0 0
Cultural configuration of Wikipedia: Measuring autoreferentiality in different languages Ribe M.M.
Rodriguez H.
International Conference Recent Advances in Natural Language Processing, RANLP English 2011 Among the motivations to write in Wikipedia given by the current literature there is often coincidence, but none of the studies presents the hypothesis of contributing for the visibility of the own national or language related content. Similar to topical coverage studies, we outline a method which allows collecting the articles of this content, to later analyse them in several dimensions. To prove its universality, the tests are repeated for up to twenty language editions of Wikipedia. Finally, through the best indicators from each dimension we obtain an index which represents the degree of autoreferentiality of the encyclopedia. Last, we point out the impact of this fact and the risk of not considering its existence in the design of applications based on user generated content. 0 0
Enhancing automatic blog classification using concept-category vectorization Ayyasamy R.K.
Alhashmi S.M.
Eu-Gene S.
Tahayna B.
Advances in Intelligent and Soft Computing English 2011 Blogging has gained popularity in recent years. Blog, a user generated content is a rich source of information and many research are conducted in finding ways to classify blogs. In this paper, we present the solution for automatic blog classification through our new framework using Wikipedia's category system. Our framework consists of two stages: The first stage is to find the meaningful terms from blogposts to a unique concept as well as disambiguate the terms belonging to more than one concept. The second stage is to determine the categories to which these found concepts appertain. Our Wikipedia based blog classification framework categorizes blog into topic based content for blog directories to perform future browsing and retrieval. Experimental results confirm that proposed framework categorizes blogposts effectively and efficiently. 0 0
Enhancing concept based modeling approach for blog classification Ayyasamy R.K.
Alhashmi S.M.
Eu-Gene S.
Tahayna B.
Advances in Intelligent and Soft Computing English 2011 Blogs are user generated content discusses on various topics. For the past 10 years, the social web content is growing in a fast pace and research projects are finding ways to channelize these information using text classification techniques. Existing classification technique follows only boolean (or crisp) logic. This paper extends our previous work with a framework where fuzzy clustering is optimized with fuzzy similarity to perform blog classification. The knowledge base-Wikipedia, a widely accepted by the research community was used for our feature selection and classification. Our experimental result proves that proposed framework significantly improves the precision and recall in classifying blogs. 0 0
Linking online news and social media Tsagkias M.
Maarten de Rijke
Weerkamp W.
Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011 English 2011 Much of what is discussed in social media is inspired by events in the news and, vice versa, social media provide us with a handle on the impact of news events. We address the following linking task: given a news article, find social media utterances that implicitly reference it. We follow a three-step approach: we derive multiple query models from a given source news article, which are then used to retrieve utterances from a target social media index, resulting in multiple ranked lists that we then merge using data fusion techniques. Query models are created by exploiting the structure of the source article and by using explicitly linked social media utterances that discuss the source article. To combat query drift resulting from the large volume of text, either in the source news article itself or in social media utterances explicitly linked to it, we introduce a graph-based method for selecting discriminative terms. For our experimental evaluation, we use data from Twitter, Digg, Delicious, the New York Times Community, Wikipedia, and the blogosphere to generate query models. We show that different query models, based on different data sources, provide complementary information and manage to retrieve different social media utterances from our target index. As a consequence, data fusion methods manage to significantly boost retrieval performance over individual approaches. Our graph-based term selection method is shown to help improve both effectiveness and efficiency. Copyright 2011 ACM. 0 0
Probabilistic quality assessment based on article's revision history Jangwhan Han
Chao Wang
Jiang D.
Lecture Notes in Computer Science English 2011 The collaborative efforts of users in social media services such as Wikipedia have led to an explosion in user-generated content and how to automatically tag the quality of the content is an eminent concern now. Actually each article is usually undergoing a series of revision phases and the articles of different quality classes exhibit specific revision cycle patterns. We propose to Assess Quality based on Revision History (AQRH) for a specific domain as follows. First, we borrow Hidden Markov Model (HMM) to turn each article's revision history into a revision state sequence. Then, for each quality class its revision cycle patterns are extracted and are clustered into quality corpora. Finally, article's quality is thereby gauged by comparing the article's state sequence with the patterns of pre-classified documents in probabilistic sense. We conduct experiments on a set of Wikipedia articles and the results demonstrate that our method can accurately and objectively capture web article's quality. 0 0
Quantifying the trustworthiness of social media content Moturu S.T.
Hongyan Liu
Distributed and Parallel Databases English 2011 The growing popularity of social media in recent years has resulted in the creation of an enormous amount of user-generated content. A significant portion of this information is useful and has proven to be a great source of knowledge. However, since much of this information has been contributed by strangers with little or no apparent reputation to speak of, there is no easy way to detect whether the content is trustworthy. Search engines are the gateways to knowledge but search relevance cannot guarantee that the content in the search results is trustworthy. A casual observer might not be able to differentiate between trustworthy and untrustworthy content. This work is focused on the problem of quantifying the value of such shared content with respect to its trustworthiness. In particular, the focus is on shared health content as the negative impact of acting on untrustworthy content is high in this domain. Health content from two social media applications, Wikipedia and Daily Strength, is used for this study. Sociological notions of trust are used to motivate the search for a solution. A two-step unsupervised, feature-driven approach is proposed for this purpose: a feature identification step in which relevant information categories are specified and suitable features are identified, and a quantification step for which various unsupervised scoring models are proposed. Results indicate that this approach is effective and can be adapted to disparate social media applications with ease. 0 0
Retratos da colaboração e da segmentação na Wikimedia: Especificidades dos wikilivros Judaísmo e Civilização Egípcia Luana Teixeira de Souza Cruz Portuguese 2011 A colaboração constitui essencialmente a estrutura daWeb 2.0, também conhecida como web colaborativa. Um importante conceito da geração Web 2.0 está implícito nas plataformas e nos modos de produção wiki. São ferramentas de construção do conhecimento online, que permitem a participação de qualquer pessoa em estágios variados do processo colaborativo. Os projetos Wikipedia e Wikilivros da Wikimedia Foundation, por exemplo, configuram espaços de interação, lugares de fala, onde os agentes expressam individualidades, muitas vezes com o intuito cooperativo de melhoria da qualidade dos livros. O que permite a lógica de funcionamento destas plataformas é a formação de uma rede de usuários interconectados, de diversas maneiras, dentro de um emaranhado de laços. Estes integram, em alguns casos, como se discute aqui, desenvolvimentos colaborativos nos projetos Wikilivros e Wikipedia. Buscando compreender como se configura a colaboração no ambiente da Wikimedia Foundation, este artigo discute de que modo os perfis editoriais do projeto Wikilivros segmentam redes sociais e como essas redes sociais transitam pela Wikimedia, uma vez que os projetos wiki são densamente interconectados. A questão é averiguada no wikilivro em desenvolvimento “Judaísmo” em comparação ao já completado wikilivro “Civilização Egípcia”. A análise comparativa enfatiza a importância da interdepêndencia entre os usuários para a configuração de redes compostas por laços, cuja força se baseia na produção colaborativa dos wikilivros. 0 0
Social media driven image retrieval Adrian Popescu
Gregory Grefenstette
Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR'11 English 2011 People often try to find an image using a short query and images are usually indexed using short annotations. Matching the query vocabulary with the indexing vocabulary is a difficult problem when little text is available. Textual user generated content in Web 2.0 platforms contains a wealth of data that can help solve this problem. Here we describe how to use Wikipedia and Flickr content to improve this match. The initial query is launched in Flickr and we create a query model based on co-occurring terms. We also calculate nearby concepts using Wikipedia and use these to expand the query. The final results are obtained by ranking the results for the expanded query using the similarity between their annotation and the Flickr model. Evaluation of these expansion and ranking techniques, over the Image CLEF 2010 Wikipedia Collection containing 237,434 images and their multilingual textual annotations, shows that a consistent improvement compared to state of the art methods. 0 0
Technology-mediated social participation: The next 25 years of HCI challenges Shneiderman B. Lecture Notes in Computer Science English 2011 The dramatic success of social media such as Facebook, Twitter, YouTube, blogs, and traditional discussion groups empowers individuals to become active in local and global communities. Some enthusiasts believe that with modest redesign, these technologies can be harnessed to support national priorities such as healthcare/wellness, disaster response, community safety, energy sustainability, etc. However, accomplishing these ambitious goals will require long-term research to develop validated scientific theories and reliable, secure, and scalable technology strategies. The enduring questions of how to motivate participation, increase social trust, and promote collaboration remain grand challenges even as the technology rapidly evolves. This talk invites researchers across multiple disciplines to participate in redefining our discipline of Human-Computer Interaction (HCI) along more social lines to answer vital research questions while creating inspirational prototypes, conducting innovative evaluations, and developing robust technologies. By placing greater emphasis on social media, the HCI community could constructively influence these historic changes. 0 0
User generated (web) content: Trash or treasure Alluvatti G.M.
Capiluppi A.
De Ruvo G.
Molfetta M.
IWPSE-EVOL'11 - Proceedings of the 12th International Workshop on Principles on Software Evolution English 2011 It has been claimed that the advent of user-generated content has reshaped the way people approached all sorts of content realization projects, being multimedia (YouTube, DeviantArt, etc.), knowledge (Wikipedia, blogs), to software in general, when based on a more general Open Source model. After many years of research and evidence, several studies have demonstrated that Open Source Software (OSS) portals often contain a large amount of software projects that simply do not evolve, often developed by relatively small communities, and that still struggle to attract a sustained number of contributors. In terms of such content, the "tragedy" appears to be that the user demand for content and the offer of experts contributing content are on curves with different slopes, with the demand growing more quickly. In this paper we argue that, even given the differences in the requested expertise, many projects reliant on user-contributed content and expertise undergo a similar evolution, along a logistic growth: a first slow growth rate is followed by a much faster evolution growth. When a project fails to attract more developers i.e. contributors, the evolution of project's content does not present the "explosive growth" phase, and it will eventually "burnout", and the project appears to be abandoned. Far from being a negative finding, even abandoned project's content provides a valuable resource that could be reused in the future within other projects. 0 1
Vandalism detection in Wikipedia: A high-performing, feature-rich model and its reduction through Lasso Sara Javanmardi
David W. McDonald
Lopes C.V.
WikiSym 2011 Conference Proceedings - 7th Annual International Symposium on Wikis and Open Collaboration English 2011 User generated content (UGC) constitutes a significant fraction of the Web. However, some wiiki-based sites, such as Wikipedia, are so popular that they have become a favorite target of spammers and other vandals. In such popular sites, human vigilance is not enough to combat vandalism, and tools that detect possible vandalism and poor-quality contributions become a necessity. The application of machine learning techniques holds promise for developing efficient online algorithms for better tools to assist users in vandalism detection. We describe an efficient and accurate classifier that performs vandalism detection in UGC sites. We show the results of our classifier in the PAN Wikipedia dataset. We explore the effectiveness of a combination of 66 individual features that produce an AUC of 0.9553 on a test dataset - the best result to our knowledge. Using Lasso optimization we then reduce our feature - rich model to a much smaller and more efficient model of 28 features that performs almost as well - the drop in AUC being only 0.005. We describe how this approach can be generalized to other user generated content systems and describe several applications of this classifier to help users identify potential vandalism. 0 0
Web 2.0 revisited: User-generated content as a social innovation Kaletka C.
Pelka B.
International Journal of Innovation and Sustainable Development English 2011 This paper raises the question whether Web 2.0 can be seen as a technological or a social innovation and which interdependencies exist between these two innovative aspects of the phenomenon. For that purpose, the definition of Web 2.0 as a tag cloud (for example given in Wikipedia) or as a difference in comparison to a 'Web 1.0' is revisited, challenged and discarded. In following steps, the paper argues that the core innovation of Web 2.0 is the communication of 'user-generated content' as a new social routine. The main enabling factors for Web 2.0 utilisation as a social routine are identified as easy-to-use software and broadly spread internet access. So while technology is seen as a 'catalyst' of the phenomenon, the innovation itself (user-generated content) is considered a social one. Copyright 0 0
Web article quality assessment in multi-dimensional space Jangwhan Han
Fu X.
Chen K.
Chao Wang
Lecture Notes in Computer Science English 2011 Nowadays user-generated content (UGC) such as Wikipedia, is emerging on the web at an explosive rate, but its data quality varies dramatically. How to effectively rate the article's quality is the focus of research and industry communities. Considering that each quality class demonstrates its specific characteristics on different quality dimensions, we propose to learn the web quality corpus by taking different quality dimensions into consideration. Each article is regarded as an aggregation of sections and each section's quality is modelled using Dynamic Bayesian Network(DBN) with reference to accuracy, completeness and consistency. Each quality class is represented by three dimension corpora, namely accuracy corpus, completeness corpus and consistency corpus. Finally we propose two schemes to compute quality ranking. Experiments show our approach performs well. 0 0
What Wikipedia deletes: Characterizing dangerous collaborative content West A.G.
Insup Lee
WikiSym 2011 Conference Proceedings - 7th Annual International Symposium on Wikis and Open Collaboration English 2011 Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions which are not simply "undone" - but deleted from revision histories and public view. Such treatment is generally reserved for edits which: (1) present a legal liability to the host (e.g., copyright issues, defamation), or (2) present privacy threats to individuals (i.e., contact information). Herein, we analyze one year of Wikipedia's public deletion log and use brute-force strategies to learn about privately handled redactions. This permits insight about the prevalence of deletion, the reasons that induce it, and the extent of end-user exposure to dangerous content. While Wikipedia's approach is generally quite reactive, we find that copyright issues prove most problematic of those behaviors studied. 0 1
What Wikipedia deletes: characterizing dangerous collaborative content Andrew G. West
Insup Lee
WikiSym English 2011 Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions which are not simply ``undone -- but *deleted* from revision histories and public view. Such treatment is generally reserved for edits which: (1) present a legal liability to the host (e.g., copyright issues, defamation), or (2) present privacy threats to individuals (i.e., contact information). Herein, we analyze one year of Wikipedia's public deletion log and use brute-force strategies to learn about privately handled redactions. This permits insight about the prevalence of deletion, the reasons that induce it, and the extent of end-user exposure to dangerous content. While Wikipedia's approach is generally quite reactive, we find that copyright issues prove most problematic of those behaviors studied. 0 1
From Encyclopædia Britannica to Wikipedia: Generational differences in the perceived credibility of online encyclopedia information Andrew J. Flanagin
Miriam J. Metzger
Information, Communication & Society English 18 November 2010 This study examined the perceived credibility of user-generated (i.e. Wikipedia)

versus more expertly provided online encyclopedic information (i.e. Citizendium, and the online version of the Encyclopædia Britannica) across generations. Two large-scale surveys with embedded quasi-experiments were conducted: among 11 –18-year-olds living at home and among adults 18 years and older. Results showed that although use of Wikipedia is common, many people (particularly adults) do not truly comprehend how Wikipedia operates in terms of information provision, and that while people trust Wikipedia as an information source, they express doubt about the appropriateness of doing so. A companion quasi-experiment found that both children and adults assess information to be more credible when it originates or appears to originate from Encyclopædia Britannica. In addition, chil- dren rated information from Wikipedia to be less believable when they viewed it on Wikipedia’s site than when that same information appeared on either Citizendium’s site or on Encyclopædia Britannica’s site. Indeed, content originating from Wikipe- dia was perceived by children as least credible when it was shown on a Wikipedia page, yet the most credible when it was shown on the page of Encyclopædia Brit-

annica. The practical and theoretical implications of these results are discussed.
7 1
Associating semantics to multilingual tags in folksonomies (poster) Garcia-Silva A.
Gracia J.
Corcho O.
CEUR Workshop Proceedings English 2010 Tagging systems are nowadays a common feature in web sites where user-generated content plays an important role. However, the lack of semantics and multilinguality hamper information retrieval process based on folksonomies. In this paper we propose an approach to bring semantics to multilingual folksonomies. This approach includes a sense disambiguation activity and takes advantage from knowledge generated by the masses in the form of articles, redirection and disambiguation links, and translations in Wikipedia. We use DBpedia[2] as semantic resource to define the tag meanings. 0 0
Information uniqueness in Wikipedia articles Kirtsis N.
Stamou S.
Tzekou P.
Zotos N.
WEBIST 2010 - Proceedings of the 6th International Conference on Web Information Systems and Technology English 2010 Wikipedia is one of the most successful worldwide collaborative efforts to put together user generated content in a meaningfully organized and intuitive manner. Currently, Wikipedia hosts millions of articles on a variety of topics, supplied by thousands of contributors. A critical factor in Wikipedia's success is its open nature, which enables everyone edit, revise and /or question (via talk pages) the article contents. Considering the phenomenal growth of Wikipedia and the lack of a peer review process for its contents, it becomes evident that both editors and administrators have difficulty in validating its quality on a systematic and coordinated basis. This difficulty has motivated several research works on how to assess the quality of Wikipedia articles. In this paper, we propose the exploitation of a novel indicator for the Wikipedia articles' quality, namely information uniqueness. In this respect, we describe a method that captures the information duplication across the article contents in an attempt to infer the amount of distinct information every article communicates. Our approach relies on the intuition that an article offering unique information about its subject is of better quality compared to an article that discusses issues already addressed in several other Wikipedia articles. 0 0
On the "localness" of user-generated content Hecht B.J.
Darren Gergle
English 2010 The "localness" of participation in repositories of user-generated content (UGC) with geospatial components has been cited as one of UGC's greatest benefits. However, the degree of localness in major UGC repositories such as Flickr and Wikipedia has never been examined. We show that over 50 percent of Flickr users contribute local information on average, and over 45 percent of Flickr photos are local to the photographer. Across four language editions of Wikipedia, however, we find that participation is less local. We introduce the spatial content production model (SCPM) as a possible factor in the localness of UGC, and discuss other theoretical and applied implications. Copyright 2010 ACM. 0 0
The perceived credibility of online encyclopedias mong children Flanagin A.J.
Metzger M.J.
ICWSM 2010 - Proceedings of the 4th International AAAI Conference on Weblogs and Social Media English 2010 This study examined young people's trust of Wikipedia as an information resource. A large-scale probability-based survey with embedded quasi-experiments was conducted with 2,747 children in the U.S. ranging from 11 to 18 years old. Results show that young people find Wikipedia to be fairly credible, but also exhibit an awareness of potential problems with non-expert, user-generated content in anonymous environments. Children tend to evaluate the credibility of online encyclopedia information with this in mind, at times with what appears to be an unwarranted devaluation of this information. Copyright © 2010, Association for the Advancement of Artificial Intelligence ( All rights reserved. 0 0
The tower of Babel meets web 2.0: User-generated content and its applications in a multilingual context Brent Hecht
Darren Gergle
Conference on Human Factors in Computing Systems - Proceedings English 2010 This study explores language's fragmenting effect on user-generated content by examining the diversity of knowledge representations across 25 different Wikipedia language editions. This diversity is measured at two levels: the concepts that are included in each edition and the ways in which these concepts are described. We demonstrate that the diversity present is greater than has been presumed in the literature and has a significant influence on applications that use Wikipedia as a source of world knowledge. We close by explicating how knowledge diversity can be beneficially leveraged to create "culturally- aware applications" and "hyperlingual applications". 0 2
Using MediaWiki as an efficient data repository and ubiquitous learning tool: An Australian example Warren I. Ubiquitous Learning English 2010 The functionality of MediaWiki ensures it is a valuable learning repository for sharing and storing information. Constructivist learning can be promoted alongside a wiki repository and various wireless u-learning tools such as mobile phones and digital cameras, to encourage students to gather and share a range of primary and secondary information in a variety of subject areas. This paper outlines one initiative adopted at an Australian University specialising in distance education, which uses a MediaWiki as the primary method for content delivery. Over a period of three-years, the Drugs, Crime and Society wiki has evolved into an organic information repository for storing and accessing current research, press and drug agency material that supplements core themes examined in each topic of the curriculum. A constructivist approach has been employed to encourage students to engage in a range of assessable and non-assessable information sharing activities. The paper also demonstrates how the Drugs, Crime and Society wiki can be accessed through various wireless u-learning technologies, which enables students undertaking field placements to add and share primary information with other students and practitioners working in the drugs field. 0 0
Keynote talk: Mining the web 2.0 for improved image search Baeza-Yates R. Lecture Notes in Computer Science English 2009 There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show how to use these sources of evidence in Flickr, such as tags, visual annotations or clicks, which represent the the wisdom of crowds behind UGC, to improve image search. These results are the work of the multimedia retrieval team at Yahoo! Research Barcelona and they are already being used in Yahoo! image search. This work is part of a larger effort to produce a virtuous data feedback circuit based on the right combination many different technologies to leverage the Web itself. 0 0
Towards semantic tagging in collaborative environments Chandramouli K.
Kliegr T.
Svatek V.
Izquierdo E.
DSP 2009: 16th International Conference on Digital Signal Processing, Proceedings English 2009 Tags pose an efficient and effective way of organization of resources, but they are not always available. A technique called SCM/THD investigated in this paper extracts entities from free-text annotations, and using the Lin similarity measure over the WordNet thesaurus classifies them into a controlled vocabulary of tags. Hypernyms extracted from Wikipedia are used to map uncommon entities to Wordnet synsets. In collaborative environments, users can assign multiple annotations to the same object hence increasing the amount of information available. Assuming that the semantics of the annotations overlap, this redundancy can be exploited to generate higher quality tags. A preliminary experiment presented in the paper evaluates the consistency and quality of tags generated from multiple annotations of the same image. The results obtained on an experimental dataset comprising of 62 annotations from four annotators show that the accuracy of a simple majority vote surpasses the average accuracy obtained through assessing the annotations individually by 18%. A moderate-strength correlation has been found between the quality of generated tags and the consistency of annotations. 0 0
Visualizing intellectual connections among philosophers using the hyperlink & semantic data from Wikipedia Athenikos S.J.
Xia Lin
WikiSym English 2009 Wikipedia, with its unique structural features and rich usergenerated content, is being increasingly recognized as a valuable knowledge source that can be exploited for various applications. The objective of the ongoing project reported in this paper is to create a Web-based knowledge portal for digital humanities based on the data extracted from Wikipedia (and other data sources). In this paper we present the interesting results we have obtained by extracting and visualizing various connections among 300 major philosophers using the structured data available in Wikipedia. Copyright 0 0
Web 2.0 tools for engineers Freschet L. Designcon 2009 English 2009 New web-enabled tools and collaboration models (sometimes called Web2.0) centered on user-generated content seem to be everywhere and revolutionizing many consumer businesses. Examples such as Wikipedia, YouTube, Facebook, eBay, Amazon, Craigslist, and others have changed the way the world operates. But how are we Engineers using these tools? This paper will present a brief overview of some of the applications that are used, what seems to be working, and what we can look forward to in the future. 0 0
Wikinomics and its discontents: A critical analysis of Web 2.0 business manifestos Van Dijck J.
Nieborg D.
New Media and Society English 2009 Collaborative culture', 'mass creativity' and 'co-creation' appear to be contagious buzzwords that are rapidly infecting economic and cultural discourse on Web 2.0. Allegedly, peer production models will replace opaque, top-down business models, yielding to transparent, democratic structures where power is in the shared hands of responsible companies and skilled, qualified users. Manifestos such as Wikinomics (Tapscott and Williams, 2006) and 'We-Think' (Leadbeater, 2007) argue collective culture to be the basis for digital commerce. This article analyzes the assumptions behind this Web 2.0 newspeak and unravels how business gurus try to argue the universal benefits of a democratized and collectivist digital space. They implicitly endorse a notion of public collectivism that functions entirely inside commodity culture. The logic of Wikinomics and 'We-Think' urgently begs for deconstruction, especially since it is increasingly steering mainstream cultural theory on digital culture. 0 0
Wikipedia’s Labor Squeeze and its Consequences Eric Goldman Journal of Telecommunications and High Technology Law English 2009 This essay explains why Wikipedia will not be able to maintain a credible website while simultaneously letting anyone freely edit it. To date, Wikipedia editors have successfully defended against malicious attacks from spammers and vandals, but as editors turn over, Wikipedia will need to recruit replacements. However, Wikipedia will have difficulty with this recruiting task due to its limited incentives for participation. Faced with a potential labor squeeze, Wikipedia will choose to restrict users’ ability to contribute to the site as a way of preserving site credibility. Wikipedia’s specific configuration choices make it an interesting test case to evaluate the tension between free editability and site credibility, and this Essay touches on how this tension affects user-generated content (UGC) generally. 0 2
Disconnected in a connected world Karpinski J.L. Medical Reference Services Quarterly English 2008 This article outlines five Web 2.0 resources and looks at the use of these tools among medical and nursing professionals and students at the Hospital, Medical School, and Nursing School of the University of Pennsylvania. Questionnaires showed that a majority of the individuals surveyed were unfamiliar with Web 2.0 resources. Additional respondents recognized the tools but did not use them in a medical or nursing context, with a minimal number using any tools to expand their medical or nursing knowledge. A lack of time to set up and use the resources, difficulty of set-up and use, skepticism about the quality of user-generated medical content, and a lack of perceived need for Web 2.0 resources contributed substantially to non-use. The University of Pennsylvania Biomedical Library is responding by increasing the availability of basic, quick, and easy-to-use instructional materials for selected Web 2.0 resources. 0 0
Named entity normalization in user generated content Jijkoun V.
Khalid M.A.
Marx M.
Maarten de Rijke
Proceedings of SIGIR 2008 Workshop on Analytics for Noisy Unstructured Text Data, AND'08 English 2008 Named entity recognition is important for semantically oriented retrieval tasks, such as question answering, entity retrieval, biomedical retrieval, trend detection, and event and entity tracking. In many of these tasks it is important to be able to accurately normalize the recognized entities, i.e., to map surface forms to unambiguous references to real world entities. Within the context of structured databases, this task (known as record linkage and data, de-duplication) has been a topic of active research for more than five decades. For edited content, such as news articles, the named entity normalization (NEN) task is one that has recently attracted considerable attention. We consider the task in the challenging context of user generated content (UGC), where it forms a key ingredient of tracking and media-analysis systems. A baseline NEN system from the literature (that normalizes surface forms to Wikipedia pages) performs considerably worse on UGC than on edited news: accuracy drops from 80% to 65% for a Dutch language data set and from 94% to 77% for English. We identify several sources of errors: entity recognition errors, multiple ways of referring to the same entity and ambiguous references. To address these issues we propose five improvements to the baseline NEN algorithm, to arrive at a language independent NEN system that achieves overall accuracy scores of 90% on the English data set and 89% on the Dutch data set. We show that each of the improvements contributes to the overall score of our improved NEN algorithm, and conclude with an error analysis on both Dutch and English language UGC. The NEN system is computationally efficient and runs with very modest computational requirements. Copyright 2008 ACM. 0 0
Robust content-driven reputation Krishnendu Chatterjee
Luca de Alfaro
Ian Pye
Proceedings of the ACM Conference on Computer and Communications Security English 2008 In content-driven reputation systems for collaborative content, users gain or lose reputation according to how their contributions fare: authors of long-lived contributions gain reputation, while authors of reverted contributions lose reputation. Existing content-driven systems are prone to Sybil attacks, in which multiple identities, controlled by the same person, perform coordinated actions to increase their reputation. We show that content-driven reputation systems can be made resistant to such attacks by taking advantage of thefact that the reputation increments and decrements depend on content modifications, which are visible to all. We present an algorithm for content-driven reputation that prevents a set of identities from increasing their maximum reputation without doing any useful work. Here, work is considered useful if it causes content to evolve in a direction that is consistent with the actions of high-reputation users. We argue that the content modifications that require no effort, such as the insertion or deletion of arbitrary text, are invariably non-useful. We prove a truthfullness result for the resulting system, stating that users who wish to perform a contribution do not gain by employing complex contribution schemes, compared to simply performing the contribution at once. In particular, splitting the contribution in multiple portions, or employing the coordinated actions of multiple identities, do not yield additional reputation. Taken together, these results indicate that content-driven systems can be made robust with respect to Sybil attacks. Copyright 2008 ACM. 0 0
Library 2.0: An overview Connor E. Medical Reference Services Quarterly English 2007 Web 2.0 technologies focus on peer production, posting, subscribing, and tagging content; building and inhabiting social networks; and combining existing and emerging applications in new and creative ways to impart meaning, improve search precision, and better represent data. The Web 2.0 extended to libraries has been called Library 2.0, which has ramifications for how librarians, including those who work in medical settings, will interact and relate to persons who were born digital, especially related to teaching and learning, and planning future library services and facilities. 0 0
The quality and trust of wiki content in a learning community Peacock T.
Fellows G.
Eustace K.
ASCILITE 2007 - The Australasian Society for Computers in Learning in Tertiary Education English 2007 User generated content is having an ever-increasing influence and presence on the Internet. Wiki communities, in particular Wikipedia, have gained wide spread attention and criticism. This research explores criticisms and strengths of wiki communities, and methods to reconcile the two. This research tests wiki software in an educational setting to determine indicators of article quality. The results give insight into the use of wiki systems in educational settings, suggest possible methods of improving the validity of content created within wiki communities, and provide groundwork for further research in the area. 0 0