Language

From WikiPapers
Jump to: navigation, search

language is included as keyword or extra keyword in 0 datasets, 4 tools and 19 publications.

Datasets

There is no datasets for this keyword.

Tools

Tool Operating System(s) Language(s) Programming language(s) License Description Image
JWordNet-Similarity Cross-platform English Java
Manypedia.com English Python
PHP
Affero GPL (code)
Creative Commons (content)
Manypedia.com is a web tool in which you can compare Linguistic Points Of View (LPOV) of different language Wikipedias. For example (but this is just one of the many possible comparisons), are you wondering if the community of editors in the English, Arabic and Hebrew Wikipedias are crystallizing different histories of the Gaza War? Manypedia palestine en ar.png
Wikokit Cross-platform Java EPLv1.0
LGPLv2.1
GPLv2
ALv2.0
New BSD License
wikokit (wiki tool kit) - several projects related to wiki.

wiwordik - machine-readable Wiktionary. A visual interface to the parsed English Wiktionary and Russian Wiktionary databases.
Java WebStart application + JavaFX, English interface.
742 languages extracted from the English Wiktionary.

423 languages extracted from the Russian Wiktionary.
Wiwordik-en.0.09.1094 scrollbox.jpg
Zawilinski Cross-platform Java Zawilinski a Java library that supports the extraction and analysis of grammatical data in Wiktionary.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Automatically building templates for entity summary construction Li P.
Yafang Wang
Jian Jiang
Information Processing and Management English 2013 In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. We first develop an entity-aspect LDA model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Finally, we use the generated templates to construct summaries for new entities. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. Also, we implement a new sentence compression algorithm which use dependency tree instead of parser tree. We apply our method on five Wikipedia entity categories and compare our method with three baseline methods. Both quantitative evaluation based on human judgment and qualitative comparison demonstrate the effectiveness and advantages of our method. © 2012 Elsevier Ltd. All rights reserved. 0 0
Invasion biology and the success of social collaboration networks, with application to wikipedia Mangel M.
Satterthwaite W.H.
Peter Pirolli
Bongwon Suh
YanChun Zhang
Israel Journal of Ecology and Evolution English 2013 We adapt methods from the stochastic theory of invasions - for which a key question is whether a propagule will grow to an established population or fail - To show how monitoring early participation in a social collaboration network allows prediction of success. Social collaboration networks have become ubiquitous and can now be found in widely diverse situations. However, there are currently no methods to predict whether a social collaboration network will succeed or not, where success is defined as growing to a specified number of active participants before falling to zero active participants. We illustrate a suitable methodology with Wikipedia. In general, wikis are web-based software that allows collaborative efforts in which all viewers of a page can edit its contents online, thus encouraging cooperative efforts on text and hypertext. The English language Wikipedia is one of the most spectacular successes, but not all wikis succeed and there have been some major failures. Using these new methods, we derive detailed predictions for the English language Wikipedia and in summary for more than 250 other language Wikipedias. We thus show how ideas from population biology can inform aspects of technology in new and insightful ways. 0 0
Managing information disparity in multilingual document collections Kevin Duh
Yeung C.-M.A.
Iwata T.
Masaaki Nagata
ACM Transactions on Speech and Language Processing English 2013 Information disparity is a major challenge with multilingual document collections. When documents are dynamically updated in a distributed fashion, information content among different language editions may gradually diverge. We propose a framework for assisting human editors to manage this information disparity, using tools from machine translation and machine learning. Given source and target documents in two different languages, our system automatically identifies information nuggets that are new with respect to the target and suggests positions to place their translations. We perform both real-world experiments and large-scale simulations on Wikipedia documents and conclude our system is effective in a variety of scenarios. 0 0
Temporal, cultural and thematic aspects of web credibility Radoslaw Nielek
Wawer A.
Jankowski-Lorek M.
Adam Wierzbicki
Lecture Notes in Computer Science English 2013 Is trust to web pages related to nation-level factors? Do trust levels change in time and how? What categories (topics) of pages tend to be evaluated as not trustworthy, and what categories of pages tend to be trustworthy? What could be the reasons of such evaluations? The goal of this paper is to answer these questions using large scale data of trustworthiness of web pages, two sets of websites, Wikipedia and an international survey. 0 0
Manypedia: Comparing Language Points of View of Wikipedia Communities Paolo Massa
Federico Scrinzi
WikiSym English August 2012 The 4 million articles of the English Wikipedia have been written in a collaborative fashion by more than 16 million volunteer editors. On each article, the community of editors strive to reach a neutral point of view, representing all significant views fairly, proportionately, and without biases. However, beside the English one, there are more than 280 editions of Wikipedia in different languages and their relatively isolated communities of editors are not forced by the platform to discuss and negotiate their points of view. So the empirical question is: do communities on different language Wikipedias develop their own diverse Linguistic Points of View (LPOV)? To answer this question we created and released as open source Manypedia, a web tool whose aim is to facilitate cross-cultural analysis of Wikipedia language communities by providing an easy way to compare automatically translated versions of their different representations of the same topic. 0 0
Enhancing the undergraduate experience through a collaborative wiki exercise to teach nursing students discipline specific terminology Doherty I.
Honey M.
Stewart L.
Proceedings of the European Conference on e-Government, ECEG English 2012 We present a randomized control trial research project that involved undergraduate nursing students working in small groups using a wiki to develop a collaborative glossary of health specific terminology. The background to the project is explained with reference to the relevant literature and the research aims and research method are both discussed in detail. We also present and discuss some preliminary results. 0 0
Manypedia: Comparing language points of view of Wikipedia communities Paolo Massa
Federico Scrinzi
WikiSym 2012 English 2012 The 4 million articles of the English Wikipedia have been written in a collaborative fashion by more than 16 million volunteer editors. On each article, the community of editors strive to reach a neutral point of view, representing all significant views fairly, proportionately, and without biases. However, beside the English one, there are more than 280 editions of Wikipedia in different languages and their relatively isolated communities of editors are not forced by the platform to discuss and negotiate their points of view. So the empirical question is: do communities on different language Wikipedias develop their own diverse Linguistic Points of View (LPOV)? To answer this question we created and released as open source Manypedia, a web tool whose aim is to facilitate cross-cultural analysis of Wikipedia language communities by providing an easy way to compare automatically translated versions of their different representations of the same topic. 0 0
The World Library of Toxicology, Chemical Safety, and Environmental Health (WLT) Wexler P.
Gilbert S.G.
Thorp N.
Faustman E.
Breskin D.D.
Human and Experimental Toxicology English 2012 The World Library of Toxicology, Chemical Safety, and Environmental Health, commonly referred to as the World Library of Toxicology (WLT), is a multilingual online portal of links to key global resources, representing a host of individual countries and multilateral organizations. The Site is designed as a network of, and gateway to, toxicological information and activities from around the world. It is built on a Wiki platform by a roster of Country Correspondents, with the aim of efficiently exchanging information and stimulating collaboration among colleagues, and building capacity, with the ultimate objective of serving as a tool to help improve global public health. The WLT was publicly launched on September 7, 2009, at the Seventh Congress of Toxicology in Developing Countries (CTDC-VII) in Sun City, South Africa. 0 0
Sustainable multilingual communication: Managing multilingual content using free and open source content management systems Todd Kelsey English May 2011 It is often too complicated or expensive for most educators, non-profits and individuals to create and maintain a multilingual Web site, because of the technological hurdles, and the logistics of working with content in different languages. But multilingual content management systems, combined with streamlined processes and inexpensive organizational tools, make it possible for educators, non-profit entities and individuals with limited resources to develop sustainable and accessible multilingual Web sites. The research included a review of what's been done in the theory and practice of designing Web sites for multilingual audiences. On the basis of that review, a series of sustainable multilingual Web sites were created, and a series of approaches and systems were tested, including MediaWiki, Plone, Drupal, Joomla, PHPMyFAQ, Blogger, Google Docs and Google Sites. There was also a case study on "Social CMS", which refers to emergent social networks such as Facebook. The case studies are reported on, and conclude with high-level recommendations that form a roadmap for sustainable multilingual Web site development. The basic conclusion is that Drupal is a recommended system for developing a multilingual Web site, based on a variety of factors. Google Sites is also a recommended system, based on the fact that it is free, easy to use, and very flexible. 9 0
Exploring linguistic points of view of Wikipedia Paolo Massa
Federico Scrinzi
WikiSym English 2011 The 3 million articles of the English Wikipedia has been written since 2011 by more than 14 million volunteers. On each article, the community of editors strive to reach a neutral point of view, representing all significant views fairly, proportionately, and without bias. However, beside the English one, there are more than 270 Wikipedias in different languages and their relatively isolated communities of editors are not forced by the platform to discuss and negotiate their points of view. So the empirical question is: do communities on different languages editions of Wikipedia develop their own diverse Linguistic Points of View (LPOV)? To answer this question we created Manypedia, a web tool whose goal is to ease cross-cultural comparisons of Wikipedia language communities by analyzing their different representations of the same topic. 0 1
Encouraging language students to contribute inflection data to Wiktionary Zachary Kurmas WikiSym English 2010 We propose building a computer program to simplify access to the inflection (i.e., “word ending”) data in Wiktionary. This program will make it easier to both (1) look up a word’s inflections and, more importantly, (2) edit incorrect inflections. We expect that such a program will encourage foreign language students to both use Wiktionary as a resource and contribute inflection and other grammar data toWiktionary. We believe that the resulting additional activity will make Wiktionary a better resource for students — especially students of those languages for which there are no cheap, comprehensive inflection resources — and provide data that will be beneficial to the wiki research community 1 0
Meta-metadata: A metadata semantics language for collection representation applications Kerne A.
Qu Y.
Webb A.M.
Damaraju S.
Lupfer N.
Mathur A.
International Conference on Information and Knowledge Management, Proceedings English 2010 Collecting, organizing, and thinking about diverse information resources is the keystone of meaningful digital information experiences, from research to education to leisure. Metadata semantics are crucial for organizing collections, yet their structural diversity exacerbates problems of obtaining and manipulating them, strewing end users and application developers amidst the shadows of a proverbial tower of Babel. We introduce meta-metadata, a language and software architecture addressing a metadata semantics lifecycle: (1) data structures for representation of metadata in programs; (2) metadata extraction from information resources; (3) semantic actions that connect metadata to collection representation applications; and (4) rules for presentation to users. The language enables power users to author metadata semantics wrappers that generalize template-based information sources. The architecture supports development of independent collection representation applications that reuse wrappers. The initial meta-metadata repository of information source wrappers includes Google, Flickr, Yahoo, IMDb, Wikipedia, and the ACM Portal. Case studies validate the approach. 0 0
Struggles online over the meaning of 'down's syndrome': A 'dialogic' interpretation Nicholas Cimini Health English 2010 Bakhtin's suggestion that a unified truth demands a 'multiplicity of consciousnesses' seems particularly relevant in the 'globally connected age'. At a time when the DIY/'punk ethic' seems to prevail online, and Wikipedia and blogging means that anyone with access to the Internet can enter into public deliberation, it is worth considering the potential for mass communication systems to create meaningful changes in the way that 'disability' is theorized. Based on the findings of qualitative research, this study explores competing interpretations of disability, specifically dialogue online over the meaning of Down's syndrome, from the vantage point of an approach towards language analysis that emanates from the work of the Bakhtin Circle. It will be shown that, suitably revised and supplemented, elements of Bakhtinian theory provide powerful tools for understanding online relations and changes in the notion of disability. It will also be shown that, while activists in the disabled people's movement have managed to effect modest changes to the way that disability is theorized, both online and in the 'real world', there remains a great deal still to be achieved. This study allows us to understand better the social struggles faced by disabled people and the opportunities open to them. 0 0
The tower of Babel meets web 2.0: User-generated content and its applications in a multilingual context Brent Hecht
Darren Gergle
Conference on Human Factors in Computing Systems - Proceedings English 2010 This study explores language's fragmenting effect on user-generated content by examining the diversity of knowledge representations across 25 different Wikipedia language editions. This diversity is measured at two levels: the concepts that are included in each edition and the ways in which these concepts are described. We demonstrate that the diversity present is greater than has been presumed in the literature and has a significant influence on applications that use Wikipedia as a source of world knowledge. We close by explicating how knowledge diversity can be beneficially leveraged to create "culturally- aware applications" and "hyperlingual applications". 0 2
Zawilinski: A library for studying grammar in wiktionary Zachary Kurmas WikiSym 2010 English 2010 We present Zawilinski, a Java library that supports the extraction and analysis of grammatical data in Wiktionary. Zawilinski can efficiently (1) filter Wiktionary for content pertaining to a specified language, and (2) extract a word's inflections from its Wiktionary entry. We have thus far used Zawilinski to (1) measure the correctness of the inflections for a subset of the Polish words in the English Wiktionary and to (2) show that this grammatical data is very stable. (Only 131 out of 4748 Polish words have had their inflection data corrected.) We also explain Zawilinski's key features and discuss how it can be used to simplify the development of additional grammar-based analyses. 0 2
Zawilinski: a library for studying grammar in Wiktionary Zachary Kurmas WikiSym English 2010 We present Zawilinski, a Java library that supports the extraction and analysis of grammatical data in Wiktionary. Zawilinski can efficiently (1) filter Wiktionary for content pertaining to a specified language, and (2) extract a word’s inflections from its Wiktionary entry. We have thus far used Zawilinski to (1) measure the correctness of the inflections for a subset of the Polish words in the English Wiktionary and to (2) show that this grammatical data is very stable. (Only 131 out of 4748 Polish words have had their inflection data corrected.) We also explain Zawilinski’s key features and discuss how it can be used to simplify the development of additional grammar-based analyses. 3 2
Language-model-based ranking for queries on RDF-graphs Elbassuoni S.
Maya Ramanath
Ralf Schenkel
Sydow M.
Gerhard Weikum
International Conference on Information and Knowledge Management, Proceedings English 2009 The success of knowledge-sharing communities like Wikipedia and the advances in automatic information extraction from textual and Web sources have made it possible to build large "knowledge repositories" such as DBpedia, Freebase, and YAGO. These collections can be viewed as graphs of entities and relationships (ER graphs) and can be represented as a set of subject-property-object (SPO) triples in the Semantic-Web data model RDF. Queries can be expressed in the W3C-endorsed SPARQL language or by similarly designed graph-pattern search. However, exact-match query semantics often fall short of satisfying the users' needs by returning too many or too few results. Therefore, IR-style ranking models are crucially needed. In this paper, we propose a language-model-based approach to ranking the results of exact, relaxed and keyword-augmented graph pattern queries over RDF graphs such as ER graphs. Our method estimates a query model and a set of result-graph models and ranks results based on their Kullback-Leibler divergence with respect to the query model. We demonstrate the effectiveness of our ranking model by a comprehensive user study. Copyright 2009 ACM. 0 0
On the evolution of computer terminology and the SPOT on-line dictionary project Hynek J.
Brada P.
Openness in Digital Publishing: Awareness, Discovery and Access - Proceedings of the 11th International Conference on Electronic Publishing, ELPUB 2007 English 2007 In this paper we discuss the issue of ICT terminology and translations of specific technical terms. We also present SPOT - a new on-line dictionary of computer terminology. SPOT's web platform is adaptable to any language and/or field. We hope that SPOT will become an open platform for discussing controversial computer terms (and their translations into Czech) among professionals. The resulting on-line computer dictionary is freely available to the general public, university teachers, students, editors and professional translators. The dictionary includes some novel features, such as presenting translated terms used in several different contexts - a feature highly appreciated namely by users lacking technical knowledge for deciding which of the dictionary terms being offered should be used. 0 0
The Richness and Reach of Wikinomics: Is the Free Web-Based Encyclopedia Wikipedia Only for the Rich Countries? Morten Rask Proceedings of the Joint Conference of The International Society of Marketing Development and the Macromarketing Society, June 2-5, 2007 2007 In this paper, a model of the patterns of correlation in Wikipedia, reach and richness, lays the foundation for studying whether or not the free web-based encyclopedia Wikipedia is only for developed countries. Wikipedia is used in this paper, as an illustrative case study for the enormous rise of the so-called Web 2.0 applications, a subject which has become associated with many golden promises: Instead of being at the outskirts of the global economy, the development of free or low-cost internet-based content and applications, makes it possible for poor, emerging, and transition countries to compete and collaborate on the same level as developed countries. Based upon data from 12 different Wikipedia language editions, we find that the central structural effect is on the level of human development in the current country. In other words, Wikipedia is in general, more for rich countries than for less developed countries. It is suggested that policy makers make investments in increasing the general level of literacy, education, and standard of living in their country. The main managerial implication for businesses, that will expand their social network applications to other countries, is to use the model of the patterns of correlation in Wikipedia, reach and richness, as a market screening and monitoring model. 0 1