User interfaces

From WikiPapers
Jump to: navigation, search

User interfaces is included as keyword or extra keyword in 0 datasets, 0 tools and 182 publications.


There is no datasets for this keyword.


There is no tools for this keyword.


Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Self-sorting map: An efficient algorithm for presenting multimedia data in structured layouts Strong G.
Gong M.
IEEE Transactions on Multimedia English 2014 This paper presents the Self-Sorting Map (SSM), a novel algorithm for organizing and presenting multimedia data. Given a set of data items and a dissimilarity measure between each pair of them, the SSM places each item into a unique cell of a structured layout, where the most related items are placed together and the unrelated ones are spread apart. The algorithm integrates ideas from dimension reduction, sorting, and data clustering algorithms. Instead of solving the continuous optimization problem that other dimension reduction approaches do, the SSM transforms it into a discrete labeling problem. As a result, it can organize a set of data into a structured layout without overlap, providing a simple and intuitive presentation. The algorithm is designed for sorting all data items in parallel, making it possible to arrange millions of items in seconds. Experiments on different types of data demonstrate the SSM's versatility in a variety of applications, ranging from positioning city names by proximities to presenting images according to visual similarities, to visualizing semantic relatedness between Wikipedia articles. 0 0
An initial analysis of semantic wikis Gil Y.
Knight A.
Zhang K.
Lei Zhang
Sethi R.
International Conference on Intelligent User Interfaces, Proceedings IUI English 2013 Semantic wikis augment wikis with semantic properties that can be used to aggregate and query data through reasoning. Semantic wikis are used by many communities, for widely varying purposes such as organizing genomic knowledge, coding software, and tracking environmental data. Although wikis have been analyzed extensively, there has been no published analysis of the use of semantic wikis. We carried out an initial analysis of twenty semantic wikis selected for their diverse characteristics and content. Based on the number of property edits per contributor, we identified several patterns to characterize community behaviors that are common to groups of wikis. 0 0
C Arsan T.
Sen R.
Ersoy B.
Devri K.K.
Lecture Notes in Electrical Engineering English 2013 In this paper, we design and implement a novel all-in-one Media Center that can be directly connected to a high-definition television (HDTV). C# programming is used for developing modular structured media center for home entertainment. Therefore it is possible and easy to add new limitless number of modules and software components. The most importantly, user interface is designed by considering two important factors; simplicity and tidiness. Proposed media center provides opportunities to users to have an experience on listening to music/radio, watching TV, connecting to Internet, online Internet videos, editing videos, Internet connection to pharmacy on duty, checking weather conditions, song lyrics, CD/DVD burning, connecting to Wikipedia. All the modules and design steps are explained in details for user friendly cost effective all-in-one media center. 0 0
ISICIL: Semantics and social networks for business intelligence Michel Buffa
Delaforge N.
Ereteo G.
Fabien Gandon
Giboin A.
Limpens F.
Lecture Notes in Computer Science English 2013 The ISICIL initiative (Information Semantic Integration through Communities of Intelligence onLine) mixes viral new web applications with formal semantic web representations and processes to integrate them into corporate practices for technological watch, business intelligence and scientific monitoring. The resulting open source platform proposes three functionalities: (1) a semantic social bookmarking platform monitored by semantic social network analysis tools, (2) a system for semantically enriching folksonomies and linking them to corporate terminologies and (3) semantically augmented user interfaces, activity monitoring and reporting tools for business intelligence. 0 0
Keeping wiki content current via news sources Adams R.
Kuntz A.
Marks M.
Martin W.
Musicant D.R.
International Conference on Intelligent User Interfaces, Proceedings IUI English 2013 Online resources known as wikis are commonly used for collection and distribution of information. We present a software implementation that assists wiki contributors with the task of keeping a wiki current. Our demonstration, built using English Wikipedia, enables wiki contributors to subscribe to sources of news, based on which it makes intelligent recommendations for pages within Wikipedia where the new content should be added. This tool is also potentially useful for helping new Wikipedia editors find material to contribute. 0 0
Making sense of open data statistics with information from Wikipedia Hienert D.
Wegener D.
Schomisch S.
Lecture Notes in Computer Science English 2013 Today, more and more open data statistics are published by governments, statistical offices and organizations like the United Nations, The World Bank or Eurostat. This data is freely available and can be consumed by end users in interactive visualizations. However, additional information is needed to enable laymen to interpret these statistics in order to make sense of the raw data. In this paper, we present an approach to combine open data statistics with historical events. In a user interface we have integrated interactive visualizations of open data statistics with a timeline of thematically appropriate historical events from Wikipedia. This can help users to explore statistical data in several views and to get related events for certain trends in the timeline. Events include links to Wikipedia articles, where details can be found and the search process can be continued. We have conducted a user study to evaluate if users can use the interface intuitively, if relations between trends in statistics and historical events can be found and if users like this approach for their exploration process. 0 0
Methodology to apply semantic wikis as lean knowledge management systems on the shop floor Zapp M.
Hoffmeister M.
Verl A.
Procedia CIRP English 2013 In manufacturing facilities, redundant work and poor product quality can be prevented by the effective use of the workers knowledge. The use of information systems can significantly improve the required knowledge management process, but need to be adapted to the requirements of manufacturing facilities. This paper presents a methodology to develop lean knowledge management systems based on semantic technology, which are designed for the needs of small and medium sized manufacturing companies. The underlying system architecture foresees a semantic wiki-system as the interface to workers. This user interface enables them to access heterogeneous data like equipment specifications, best practices and pictures as well as to rapidly record their observations and actions in the system. Furthermore, the system is equipped with a semantic inference engine, which performs content analysis and thereby automatically generates new facts in the knowledge base. Last, a semantic data interface interconnects the system with external information systems on the shop floor. The interface allows importing, interlinking and storing recipes and reports in the common knowledge base. Based on these three system components, the system facilitates structured and integrated access, storage and re-use of expert knowledge and production data. 0 0
Models of human navigation in information networks based on decentralized search Denis Helic
Strohmaier M.
Michael Granitzer
Scherer R.
HT 2013 - Proceedings of the 24th ACM Conference on Hypertext and Social Media English 2013 Models of human navigation play an important role for understanding and facilitating user behavior in hypertext systems. In this paper, we conduct a series of principled experiments with decentralized search - an established model of human navigation in social networks - and study its applicability to information networks. We apply several variations of decentralized search to model human navigation in information networks and we evaluate the outcome in a series of experiments. In these experiments, we study the validity of decentralized search by comparing it with human navigational paths from an actual information network - Wikipedia. We find that (i) navigation in social networks appears to differ from human navigation in information networks in interesting ways and (ii) in order to apply decentralized search to information networks, stochastic adaptations are required. Our work illuminates a way towards using decentralized search as a valid model for human navigation in information networks in future work. Our results are relevant for scientists who are interested in modeling human behavior in information networks and for engineers who are interested in using models and simulations of human behavior to improve on structural or user interface aspects of hypertextual systems. Copyright 2013 ACM. 0 0
Impact of platform design on cross-language information exchange Hale S. Conference on Human Factors in Computing Systems - Proceedings English 2012 This paper describes two case studies examining the impact of platform design on cross-language communications. The sharing of off-site hyperlinks between language editions of Wikipedia and between users on Twitter with different languages in their user descriptions are analyzed and compared in the context of the 2011 Tohoku earthquake and tsunami in Japan. The paper finds that a greater number of links are shared across languages on Twitter, while a higher percentage of links are shared between Wikipedia articles. The higher percentage of links being shared on Wikipedia is attributed to the persistence of links and the ability for users to link articles on the same topic together across languages. 0 0
Planteome Annotation Wiki: A semantic application for the community curation of Plant genotypes and phenotypes Justin Preece
Justin Elser
Pankaj Jaiswal
ACM International Conference Proceeding Series English 2012 Two notable trends currently impacting biology curation are 1) the use of wikis to input, store, and disseminate re-search data and 2) the development of semantic technologies to facilitate higher-order data description and exploration. These separate developments, when brought together, have the potential to deliver on one promise of the\semantic web": structured, self-described data used to further scientific research and analysis. The Semantic MediaWiki [5] extension, when used in conjunction with Semantic Forms [4], pro-vides an avenue to create a semantically-driven, community-powered research platform on the web. The Planteome Annotation Wiki implements these technology platforms to provide a user interface for annotation, personal user accounts, a set of previously-curated annotations (i.e. from TAIR [6], Gramene [7], and the Plant Ontology Consortium [3]) and a rigorous semantic data structure. The wiki also dynamically integrates data from other sites via web services. For example, Gene Ontology [2] and Plant Ontology terms, PubMed references and taxonomic data are all available. An import utility accepting large-scale GO Annotation File Format (GAF [1]) data has also been developed, and the wiki provides multi-format import, export, and semantic browsing and search capabilities. Future enhancements include an exploration of semantic inferencing capabilities using ontologies, a curatorial approval mechanism, and further data integration with other biowikis. Copyright 0 0
Self organizing maps for visualization of categories Szymanski J.
Duch W.
Lecture Notes in Computer Science English 2012 Visualization of Wikipedia categories using Self Organizing Maps shows an overview of categories and their relations, helping to narrow down search domains. Selecting particular neurons this approach enables retrieval of conceptually similar categories. Evaluation of neural activations indicates that they form coherent patterns that may be useful for building user interfaces for navigation over category structures. 0 0
Tasteweights: A visual interactive hybrid recommender system Svetlin Bostandjiev
John O'Donovan
Tobias Hollerer
RecSys'12 - Proceedings of the 6th ACM Conference on Recommender Systems English 2012 This paper presents an interactive hybrid recommendation system that generates item predictions from multiple social and semantic web resources, such as Wikipedia, Facebook, and Twitter. The system employs hybrid techniques from traditional recommender system literature, in addition to a novel interactive interface which serves to explain the recommendation process and elicit preferences from the end user. We present an evaluation that compares different interactive and non-interactive hybrid strategies for computing recommendations across diverse social and semantic web APIs. Results of the study indicate that explanation and interaction with a visual representation of the hybrid system increase user satisfaction and relevance of predicted content. Copyright © 2012 by the Association for Computing Machinery, Inc. (ACM). 0 0
"How should I go from-to-without getting killed?" Motivation and benefits in open collaboration Katherine Panciera
Masli M.
Loren Terveen
WikiSym 2011 Conference Proceedings - 7th Annual International Symposium on Wikis and Open Collaboration English 2011 Many people rely on open collaboration projects to run their computer (Linux), browse the web (Mozilla Firefox), and get information (Wikipedia). While these projects are successful, many such efforts suffer from lack of participation. Understanding what motivates users to participate and the benefits they perceive from their participation can help address this problem. We examined these issues through a survey of contributors and information consumers in the Cyclopath geographic wiki. We analyzed subject responses to identify a number of key motives and perceived benefits. Based on these results, we articulate several general techniques to encourage more and new forms of participation in open collaboration communities. Some of these techniques have the potential to engage information consumers more deeply and productively in the life of open collaboration communities. 0 0
2 nd international workshop on intelligent user interfaces for developing regions: IUI4DR Agarwal S.K.
Rajput N.
Thies B.
Paek T.
International Conference on Intelligent User Interfaces, Proceedings IUI English 2011 Information Technology (IT) has had significant impact on the society and has touched all aspects of our lives. Up and until now computers and expensive devices have fueled this growth. It has resulted in several benefits to the society. The challenge now is to take this success of IT to its next level where IT services can be accessed by the users in developing regions. The focus of the workshop in 2011 is to identify the alternative sources of intelligence and use them to ease the interaction process with information technology. We would like to explore the different modalities, their usage by the community, the intelligence that can be derived by the usage, and finally the design implications on the user interface. We would also like to explore ways in which people in developing regions would react to collaborative technologies and/or use collaborative interfaces that require community support to build knowledge bases (example Wikipedia) or to enable effective navigation of content and access to services. 0 0
A comparative assessment of answer quality on four question answering sites Fichman P. Journal of Information Science English 2011 Question answering (Q&A) sites, where communities of volunteers answer questions, may provide faster, cheaper, and better services than traditional institutions. However, like other Web 2.0 platforms, user-created content raises concerns about information quality. At the same time, Q&A sites may provide answers of different quality because they have differen communities and technological platforms. This paper compares answer quality on four Q&A sites: Askville, WikiAnswers, Wikipedia Reference Desk, and Yahoo! Answers. Findings indicate that: (1) similar collaborative processes on these sites result in a wide range of outcomes, and significant differences in answer accuracy, completeness, and verifiability were evident; (2) answer multiplication does not always result in better information; it yields more complete and verifiable answers but does not result in higher accuracy levels; and (3) a Q&A site's popularity does not correlate with its answer quality, on all three measures. 0 0
A survey on web archiving initiatives Gomes D.
Miranda J.
Costa M.
Lecture Notes in Computer Science English 2011 Web archiving has been gaining interest and recognized importance for modern societies around the world. However, for web archivists it is frequently difficult to demonstrate this fact, for instance, to funders. This study provides an updated and global overview of web archiving. The obtained results showed that the number of web archiving initiatives significantly grew after 2003 and they are concentrated on developed countries. We statistically analyzed metrics, such as, the volume of archived data, archive file formats or number of people engaged. Web archives all together must process more data than any web search engine. Considering the complexity and large amounts of data involved in web archiving, the results showed that the assigned resources are scarce. A Wikipedia page was created to complement the presented work and be collaboratively kept up-to-date by the community. 3 0
A wikipedia-based framework for collaborative semantic annotation Fernandez N.
Fisteus J.A.
Fuentes D.
Sanchez L.
Luque V.
International Journal on Artificial Intelligence Tools English 2011 The semantic web aims at automating web data processing tasks that nowadays only humans are able to do. To make this vision a reality, the information on web resources should be described in a computer-meaningful way, in a process known as semantic annotation. In this paper, a manual, collaborative semantic annotation framework is described. It is designed to take advantage of the benefits of manual annotation systems (like the possibility of annotating formats difficult to annotate in an automatic manner) addressing at the same time some of their limitations (reduce the burden for non-expert annotators). The framework is inspired by two principles: use Wikipedia as a facade for a formal ontology and integrate the semantic annotation task with common user actions like web search. The tools in the framework have been implemented, and empirical results obtained in experiences carried out with these tools are reported. 0 0
A-R-E: The author-review-execute environment Muller W.
Rojas I.
Eberhart A.
Peter Haase
Schmidt M.
Procedia Computer Science English 2011 The Author-Review-Execute (A-R-E) is an innovative concept to offer under a single principle and platform an environment to support the life cycle of an (executable) paper; namely the authoring of the paper, its submission, the reviewing process, the author's revisions, its publication, and finally the study (reading/interaction) of the paper as well as extensions (follow ups) of the paper. It combines Semantic Wiki technology, a resolver that solves links both between parts of documents to executable code or to data, an anonymizing component to support the authoring and reviewing tasks, and web services providing link perennity. 0 0
Accessing dynamic web page in users language Sharma M.K.
Saha P.K.
Sarcar S.
Ghosh S.
Samanta D.
TechSym 2011 - Proceedings of the 2011 IEEE Students' Technology Symposium English 2011 In recent years, there is a rapid advancement in Information and Communication Technology (ICT). However, the explosive growth of ICT and its many applications in education, health, agriculture etc. are confined to a limited number of privileged people who have both language and digital literacy. At present the repositories in Internet are mainly in English, as a consequence users unfamiliar to English are not able to get benefits from Internet. Although many enterprises like Google have addressed this problem by providing translation engines but they have their own limitations. One major limitation is that translation engines fail to translate the dynamic content of the web pages which are written in English in web server database. We address the problem in this work and propose a user friendly interface mechanism through which a user can interact to any web services in Internet. We illustrate the access of Indian Railway Passenger Reservation System and interaction with Wikipedia English Website signifying the efficacy of the proposed mechanism as two case studies. 0 0
AdaptableGIMP: Designing a socially-adaptable interface Ben L.
Krynicki F.
Terry M.
Bunt A.
Lount M.
UIST'11 Adjunct - Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology English 2011 We introduce the concept of a socially-adaptable interface, an interface that provides instant access to task-specific interface customizations created, edited, and documented by the application's user community. We demonstrate this concept in AdaptableGIMP, a modified version of the GIMP image editor that we have developed. 0 0
Adessowiki - Collaborative platform for writing executable papers Machado R.C.
Rittner L.
Lotufo R.A.
Procedia Computer Science English 2011 Adessowiki is a collaborative platform for scientific programming and document writing. It is a wiki environment that carries simultaneously documentation, programming code and results of its execution without any software configuration such as compilers, libraries and special tools at the client side. This combination of a collaborative wiki environment, central server and execution of code at rendering time enables the use of Adessowiki as an executable paper platform, since it fulfills the need to disseminate, validate, and archive research data. 0 0
An exploratory study of navigating Wikipedia semantically: Model and application Wu I.-C.
Lin Y.-S.
Liu C.-H.
Lecture Notes in Computer Science English 2011 Due to the popularity of link-based applications like Wikipedia, one of the most important issues in online research is how to alleviate information overload on the World Wide Web (WWW) and facilitate effective information-seeking. To address the problem, we propose a semantically-based navigation application that is based on the theories and techniques of link mining, semantic relatedness analysis and text summarization. Our goal is to develop an application that assists users in efficiently finding the related subtopics for a seed query and then quickly checking the content of articles. We establish a topic network by analyzing the internal links of Wikipedia and applying the Normalized Google Distance algorithm in order to quantify the strength of the semantic relationships between articles via key terms. To help users explore and read topic-related articles, we propose a SNA-based summarization approach to summarize articles. To visualize the topic network more efficiently, we develop a semantically-based WikiMap to help users navigate Wikipedia effectively. 0 0
Approach of Web2.0 application pattern applied to the information teaching Li G.
Liu M.
Zhe Wang
Chen W.
Communications in Computer and Information Science English 2011 This paper firstly focuses on the development and function of Web2.0 from an educational perspective. Secondly, it introduces the features and theoretical foundation of Web 2.0. Consequently, The application pattern used in the information teaching based on the introduction described above is elaborated and proved to be an effective way of increasing educational productivity. Lastly, this paper presents the related cases and teaching resources for reference. 0 0
Automated construction of domain ontology taxonomies from wikipedia Juric D.
Banek M.
Skocir Z.
Lecture Notes in Computer Science English 2011 The key step for implementing the idea of the Semantic Web into a feasible system is providing a variety of domain ontologies that are constructed on demand, in an automated manner and in a very short time. In this paper we introduce an unsupervised method for constructing domain ontology taxonomies from Wikipedia. The benefit of using Wikipedia as the source is twofold: first, the Wikipedia articles are concise and have a particularly high "density"of domain knowledge; second, the articles represent a consensus of a large community, thus avoiding term disagreements and misinterpretations. The taxonomy construction algorithm, aimed at finding the subsumption relation, is based on two different techniques, which both apply linguistic parsing: analyzing the first sentence of each Wikipedia article and processing the categories associated with the article. The method has been evaluated against human judgment for two independent domains and the experimental results have proven its robustness and high precision. 0 0
Automatic semantic web annotation of named entities Charton E.
Marie-Pierre Gagnon
Ozell B.
Lecture Notes in Computer Science English 2011 This paper describes a method to perform automated semantic annotation of named entities contained in large corpora. The semantic annotation is made in the context of the Semantic Web. The method is based on an algorithm that compares the set of words that appear before and after the name entity with the content of Wikipedia articles, and identifies the more relevant one by means of a similarity measure. It then uses the link that exists between the selected Wikipedia entry and the corresponding RDF description in the Linked Data project to establish a connection between the named entity and some URI in the Semantic Web. We present our system, discuss its architecture, and describe an algorithm dedicated to ontological disambiguation of named entities contained in large-scale corpora. We evaluate the algorithm, and present our results. 0 0
Beyond the bag-of-words paradigm to enhance information retrieval applications Paolo Ferragina Proceedings - 4th International Conference on SImilarity Search and APplications, SISAP 2011 English 2011 The typical IR-approach to indexing, clustering, classification and retrieval, just to name a few, is the one based on the bag-of-words paradigm. It eventually transforms a text into an array of terms, possibly weighted (with tf-idf scores or derivatives), and then represents that array via points in highly-dimensional space. It is therefore syntactical and unstructured, in the sense that different terms lead to different dimensions. Co-occurrence detection and other processing steps have been thus proposed (see e.g. LSI, Spectral analysis [7]) to identify the existence of those relations, but yet everyone is aware of the limitations of this approach especially in the expanding context of short (and thus poorly composed) texts, such as the snippets of search-engine results, the tweets of a Twitter channel, the items of a news feed, the posts of a blog, or the advertisement messages, etc.. A good deal of recent work is attempting to go beyond this paradigm by enriching the input text with additional structured annotations. This general idea has been declined in the literature in two distinct ways. One consists of extending the classic term-based vector-space model with additional dimensions corresponding to features (concepts) extracted from an external knowledge base, such as DMOZ, Wikipedia, or even the whole Web (see e.g. [4, 5, 12]). The pro of this approach is to extend the bag-of-words scheme with more concepts, thus possibly allowing the identification of related texts which are syntactically far apart. The cons resides in the contamination of these vectors by un-related (but common) concepts retrieved via the syntactic queries. The second way consists of identifying in the input text short-and-meaningful sequences of terms (aka spots) which are then connected to unambiguous concepts drawn from a catalog. The catalog can be formed by either a small set of specifically recognized types, most often People and Locations (aka Named Entities, see e.g. [13, 14]), or it can consists of millions of concepts drawn from a large knowledge base, such as Wikipedia. This latter catalog is ever-expanding and currently offers the best trade-off between a catalog with a rigorous structure but with low coverage (like WordNet, CYC, TAP), and a large text collection with wide coverage but unstructured and noised content (like the whole Web). To understand how this annotation works, let us consider the following short news: "Diego Maradona won against Mexico". The goal of the annotation is to detect "Diego Maradona" and"Mexico" as spots, and then hyper-link them with theWikipedia pages which deal with the ex Argentina's coach and the football team of Mexico. The annotator uses as spots the anchor texts which occur in Wikipedia pages, and as possible concepts for each spot the (possibly many) pages pointed in Wikipedia by that spot/anchor 0 0
COBS: Realizing decentralized infrastructure for collaborative browsing and search Von Der Weth C.
Anwitaman Datta
Proceedings - International Conference on Advanced Information Networking and Applications, AINA English 2011 Finding relevant and reliable information on the web is a non-trivial task. While internet search engines do find correct web pages with respect to a set of keywords, they often cannot ensure the relevance or reliability of their content. An emerging trend is to harness internet users in the spirit of Web 2.0, to discern and personalize relevant and reliable information. Users collaboratively search or browse for information, either directly by communicating or indirectly by adding meta information (e.g., tags) to web pages. While gaining much popularity, such approaches are bound to specific service providers, or the Web 2.0 sites providing the necessary features, and the knowledge so generated is also confined to, and subject to the whims and censorship of such providers. To overcome these limitations we introduce COBS, a browser-centric knowledge repository which enjoys the inherent openness (similar to WIKIPEDIA) while aiming to provide end-users the freedom of personalization and privacy by adopting an eventually hybrid/p2p back-end. In this paper we first present the COBS front-end, a browser add-on that enables users to tag, rate or comment arbitrary web pages and to socialize with others in both a synchronous and asynchronous manner. We then discuss how a decentralized back-end can be realized. While Distributed Hash Tables (DHTs) are the most natural choice, and despite a decade of research on DHT designs, we encounter several, some small, while others more fundamental shortcomings that need to be surmounted in order to realize an efficient, scalable and reliable decentralized back-end for COBS. To that end, we outline various design alternatives and discuss qualitatively (and quantitatively, when possible) their (dis-)advantages. We believe that the objectives of COBS are ambitious, posing significant challenges for distributed systems, middleware and distributed data-analytics research, even while building on the existing momentum. Based on experiences from our ongoing work on COBS, we outline these systems research issues in this position paper. 0 0
Categorising social tags to improve folksonomy-based recommendations Ivan Cantador
Ioannis Konstas
Jose J.M.
Journal of Web Semantics English 2011 In social tagging systems, users have different purposes when they annotate items. Tags not only depict the content of the annotated items, for example by listing the objects that appear in a photo, or express contextual information about the items, for example by providing the location or the time in which a photo was taken, but also describe subjective qualities and opinions about the items, or can be related to organisational aspects, such as self-references and personal tasks. Current folksonomy-based search and recommendation models exploit the social tag space as a whole to retrieve those items relevant to a tag-based query or user profile, and do not take into consideration the purposes of tags. We hypothesise that a significant percentage of tags are noisy for content retrieval, and believe that the distinction of the personal intentions underlying the tags may be beneficial to improve the accuracy of search and recommendation processes. We present a mechanism to automatically filter and classify raw tags in a set of purpose-oriented categories. Our approach finds the underlying meanings (concepts) of the tags, mapping them to semantic entities belonging to external knowledge bases, namely WordNet and Wikipedia, through the exploitation of ontologies created within the W3C Linking Open Data initiative. The obtained concepts are then transformed into semantic classes that can be uniquely assigned to content- and context-based categories. The identification of subjective and organisational tags is based on natural language processing heuristics. We collected a representative dataset from Flickr social tagging system, and conducted an empirical study to categorise real tagging data, and evaluate whether the resultant tags categories really benefit a recommendation model using the Random Walk with Restarts method. The results show that content- and context-based tags are considered superior to subjective and organisational tags, achieving equivalent performance to using the whole tag space. © 2010 Elsevier B.V. All rights reserved. 0 0
ClusteringWiki: Personalized and collaborative clustering of search results Anastasiu D.C.
Gao B.J.
Buttler D.
SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2011 How to organize and present search results plays a critical role in the utility of search engines. Due to the unprecedented scale of the Web and diversity of search results, the common strategy of ranked lists has become increasingly inadequate, and clustering has been considered as a promising alternative. Clustering divides a long list of disparate search results into a few topic-coherent clusters, allowing the user to quickly locate relevant results by topic navigation. While many clustering algorithms have been proposed that innovate on the automatic clustering procedure, we introduce ClusteringWiki, the first prototype and framework for personalized clustering that allows direct user editing of clustering results. Through a Wiki interface, the user can edit and annotate the membership, structure and labels of clusters for a personalized presentation. In addition, the edits and annotations can be shared among users as a mass collaborative way of improving search result organization and search engine utility. 0 0
Concept disambiguation exploiting semantic databases Hossucu A.G.
Ayyildiz H.
Gokturk Z.O.
Proceedings of the International Workshop on Semantic Web Information Management, SWIM 2011 English 2011 This paper presents a novel approach for resolving ambiguities in concepts that already reside in semantic databases such as Freebase and DBpedia. Different from standard dictionaries and lexical databases, semantic databases provide a rich hierarchy of semantic relations in ontological structures. Our disambiguation approach decides on the implied sense by computing concept similarity measures as a function of semantic relations defined in ontological graph representation of concepts. Our similarity measures also utilize Wikipedia descriptions of concepts. We performed a preliminary experimental evaluation, measuring disambiguation success rate and its correlation with input text content. The results show that our method outperforms well-known disambiguation methods. 0 0
Conceptual indexing of documents using Wikipedia Carlo Abi Chahine
Nathalie Chaignaud
Kotowicz J.-P.
Pecuchet J.-P.
Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 English 2011 This paper presents an indexing support system that suggests for librarians a set of topics and keywords relevant to a pedagogical document. Our method of document indexing uses the Wikipedia category network as a conceptual taxonomy. A directed acyclic graph is built for each document by mapping terms (one or more words) to a concept in the Wikipedia category network. Properties of the graph are used to weight these concepts. This allows the system to extract socalled important concepts from the graph and to disambiguate terms of the document. According to these concepts, topics and keywords are proposed. This method has been evaluated by the librarians on a corpus of french pedagogical documents. 0 0
Conceptual query expansion and visual search results exploration for Web image retrieval Hoque E.
Strong G.
Hoeber O.
Gong M.
Advances in Intelligent and Soft Computing English 2011 Most approaches to image retrieval on the Web have their basis in document search techniques. Images are indexed based on the text that is related to the images. Queries are matched to this text to produce a set of search results, which are organized in paged grids that are reminiscent of lists of documents. Due to ambiguity both with the user-supplied query and with the text used to describe the images within the search index, most image searches contain many irrelevant images distributed throughout the search results. In this paper we present a method for addressing this problem.We perform conceptual query expansion using Wikipedia in order to generate a diverse range of images for each query, and then use a multi-resolution self organizing map to group visually similar images. The resulting interface acts as an intelligent search assistant, automatically diversifying the search results and then allowing the searcher to interactively highlight and filter images based on the concepts, and zoom into an area within the image space to show additional images that are visually similar. 0 0
Content-based recommendation algorithms on the Hadoop mapreduce framework De Pessemier T.
Vanhecke K.
Dooms S.
Martens L.
WEBIST 2011 - Proceedings of the 7th International Conference on Web Information Systems and Technologies English 2011 Content-based recommender systems are widely used to generate personal suggestions for content items based on their metadata description. However, due to the required (text) processing of these metadata, the computational complexity of the recommendation algorithms is high, which hampers their application in large-scale. This computational load reinforces the necessity of a reliable, scalable and distributed processing platform for calculating recommendations. Hadoop is such a platform that supports data-intensive distributed applications based on map and reduce tasks. Therefore, we investigated how Hadoop can be utilized as a cloud computing platform to solve the scalability problem of content-based recommendation algorithms. The various MapReduce operations, necessary for keyword extraction and generating content-based suggestions for the end-user, are elucidated in this paper. Experimental results on Wikipedia articles prove the appropriateness of Hadoop as an efficient and scalable platform for computing content-based recommendations. 0 0
Coping with the dynamics of open, social media on mobile devices with mobile facets Kleinen A.
Scherp A.
Staab S.
Proceedings of the 4th International Workshop on Semantic Ambient Media Experience, SAME 2011, in Conjunction with the 5th International Convergence on Communities and Technologies English 2011 When traveling to a foreign city or wanting to know what is happening in one's home area, users today often search and explore different social media platforms. In order to provide different social media sources in an integrated manner on a mobile device, we have developed Mobile Facets. Mobile Facets allows for the faceted, interactive search and explo- ration of social media on a touchscreen mobile phone. The social media is queried live from different data sources and professional content sources like DBpedia, a Semantic Web version of Wikipedia, the event directories Eventful and Up- coming, geo-located Flickr photos, and GeoNames. Mobile Facets provides an integrated retrieval and interactive ex- ploration of resources from these social media sources such as places, persons, organizations, and events. One does not know in advance how many facets the application will receive from such sources in a specific contextual situation and how many data items for the facets will be provided. Thus, the user interface of Mobile Facets is to be designed to cope with this dynamics of social media. Copyright 0 0
Creating online collaborative environments for museums: A case study of a museum wiki Alison Hsiang-Yi Liu
Jonathan P. Bowen
Int. J. Web Based Communities English 2011 Museums have been increasingly adopting Web 2.0 technology to reach and interact with their visitors. Some have experimented with wikis to allow both curators and visitors to provide complementary information about objects in the museum. An example of this is the Object Wiki from the Science Museum in London. Little has been done to study these interactions in an academic framework. In the field of knowledge management, the concept of 'communities of practice' has been posited as a suitable structure in which to study how knowledge is developed within a community with a common interest in a particular domain, using a sociological approach. Previously this has been used in investigating the management of knowledge within business organisations, teachers' professional development, and online e-learning communities. The authors apply this approach to a museum-based wiki to assess its applicability for such an endeavour. Copyright 0 0
Cross-lingual recommendations in a resource-based learning scenario Schmidt S.
Scholl P.
Rensing C.
Steinmetz R.
Lecture Notes in Computer Science English 2011 CROKODIL is a platform supporting resource-based learning scenarios for self-directed, on-task learning with web resources. As CROKODIL enables the forming of possibly large learning communities, the stored data is growing in a large scale. Thus, an appropriate recommendation of tags and learning resources becomes increasingly important for supporting learners. We propose semantic relatedness between tags and resources as a basis of recommendation and identify Explicit Semantic Analysis (ESA) using Wikipedia as reference corpus as a viable option. However, data from CROKODIL shows that tags and resources are often composed in different languages. Thus, a monolingual approach to provide recommendations is not applicable in CROKODIL. Thus, we examine strategies for providing mappings between different languages, extending ESA to provide cross-lingual capabilities. Specifically, we present mapping strategies that utilize additional semantic information contained in Wikipedia. Based on CROKODIL's application scenario, we present an evaluation design and show results of cross-lingual ESA. 0 0
Design guidelines for software processes knowledge repository development Garcia J.
Amescua A.
Sanchez M.-I.
Bermon L.
Information and Software Technology English 2011 Context: Staff turnover in organizations is an important issue that should be taken into account mainly for two reasons: Employees carry an organization's knowledge in their heads and take it with them wherever they goKnowledge accessibility is limited to the amount of knowledge employees want to share Objective: The aim of this work is to provide a set of guidelines to develop knowledge-based Process Asset Libraries (PAL) to store software engineering best practices, implemented as a wiki. Method: Fieldwork was carried out in a 2-year training course in agile development. This was validated in two phases (with and without PAL), which were subdivided into two stages: Training and Project. Results: The study demonstrates that, on the one hand, the learning process can be facilitated using PAL to transfer software process knowledge, and on the other hand, products were developed by junior software engineers with a greater degree of independence. Conclusion: PAL, as a knowledge repository, helps software engineers to learn about development processes and improves the use of agile processes. © 2011 Elsevier B.V. All rights reserved. 0 0
Disambiguation and filtering methods in using web knowledge for coreference resolution Uryupina O.
Poesio M.
Claudio Giuliano
Kateryna Tymoshenko
Proceedings of the 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24 English 2011 We investigate two publicly available web knowledge bases, Wikipedia and Yago, in an attempt to leverage semantic information and increase the performance level of a state-of-the-art coreference resolution (CR) engine. We extract semantic compatibility and aliasing information from Wikipedia and Yago, and incorporate it into a CR system. We show that using such knowledge with no disambiguation and filtering does not bring any improvement over the baseline, mirroring the previous findings (Ponzetto and Poesio 2009). We propose, therefore, a number of solutions to reduce the amount of noise coming from web resources: using disambiguation tools for Wikipedia, pruning Yago to eliminate the most generic categories and imposing additional constraints on affected mentions. Our evaluation experiments on the ACE-02 corpus show that the knowledge, extracted from Wikipedia and Yago, improves our system's performance by 2-3 percentage points. Copyright © 2011, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
Don't leave me alone: Effectiveness of a framed wiki-based learning activity Nikolaos Tselios
Panagiota Altanopoulou
Vassilis Komis
WikiSym 2011 Conference Proceedings - 7th Annual International Symposium on Wikis and Open Collaboration English 2011 In this paper, the effectiveness of a framed wiki-based learning activity is examined. A one-group pretest-posttest design was conducted towards this aim. The study involved 146 first year university students of a Greek Education Department using wikis to learn basic aspects and implications of search engines in the context of a first year course entitled "Introduction to ICT". Data analysis showed significant improvement in learning outcomes, in particular for students with low initial performance. The average students' questionnaire score jumped from 38.6% to 55%. In addition, a positive attitude towards using wikis in their project was expressed by the students. The design of the activity, the context of the study and the results obtained are discussed in detail. 0 0
Editing the Wikipedia: Its role in science education Mareca P.
Bosch V.A.
Proceedings of the 6th Iberian Conference on Information Systems and Technologies, CISTI 2011 Spanish 2011 This paper describes and analyzes how the cooperation of Engineering students in a Wikipedia editing project helped to improve their learning and understanding of Physics. This project aims to incorporate to the first University Courses other forms of learning, including specifically the communication of scientific concepts to other students and general audiences. Students have been in accordance to say that with the Wikipedia project have learned to work better together and helped them gain insight into the concepts of Physics. 0 0
Embedding MindMap as a service for user-driven composition of web applications Guabtni A.
Clarke S.
Benatallah B.
Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011 English 2011 The World Wide Web is evolving towards a very large distributed platform allowing ubiquitous access to a wide range of Web applications with minimal delay and no installation required. Such Web applications range from having users undertake simple tasks, such as filling a form, to more complex tasks including collaborative work, project management, and more generally, creating, consulting, annotating, and sharing Web content. However, users are lacking a simple but yet powerful mechanism to compose Web applications, similarly to what desktop environments allowed for decades using the file explorer paradigm and the desktop metaphor. Attempts have been made to adapt the desktop metaphor to the Web environment giving birth to Webtops (Web desktops). It essentially consisted of embedding a desktop environment in a Web browser and provide access to various Web applications within the same User Interface. However, those attempts did not take into consideration to the radical differences between Web and desktop environments and applications. In this work, we introduce a new approach for Web application composition based on the mindmap metaphor. It allows browsing artifacts (Web resources) and enabling user-driven composition of their associated Web applications. Essentially, a mindmap is a graph of widgets representing artifacts created or used by Web applications and allow to list and launch all possible Web applications associated to each artifact. A tool has been developed to experiment the new metaphor and is provided as a service to be embedded in Web applications via a Web browser's plug-in. We demonstrate in this paper three case studies regarding the DBLP Web site, Wikipedia and Google Picasa Web applications. 0 0
Emotion dependent dialogues in the VirCA system Fulop I.M. 2011 2nd International Conference on Cognitive Infocommunications, CogInfoCom 2011 English 2011 In the VirCA system, the Wikipedia cyber device was developed in order to realize dialogues with human users as a case of inter-cognitive sensor sharing communication. [1] These dialogues are based on the scenarios of wiki pages edited on the web. This cyber device was extended with the ability of emotion support: the Wikipedia answers the user with the emotion received from some emotion tracker component. This way not only speech but emotion is transferred as well in the course of cognitive infocommunication. To realize this attitude, on the one hand, a universal thesaurus component was developed, which can select the appropriate version of a default lingual item which matches the received emotion. On the other hand, a universal emotion tracker component was also developed to recognize the emotion of the user either from the voice or the used lingual items of the user. This paper intends to present how the different components are connected together in order to realize the desired behaviour. It is going to be described how the universal components are exactly operating and which technologies are applied to achieve the required operation. Examples for the usage of the system are going to be presented as well. 0 0
Enhancing accessibility of microblogging messages using semantic knowledge Hu X.
Tang L.
Hongyan Liu
International Conference on Information and Knowledge Management, Proceedings English 2011 The volume of microblogging messages is increasing exponentially with the popularity of microblogging services. With a large number of messages appearing in user interfaces, it hinders user accessibility to useful information buried in disorganized, incomplete, and unstructured text messages. In order to enhance user accessibility, we propose to aggregate related microblogging messages into clusters and automatically assign them semantically meaningful labels. However, a distinctive feature of microblogging messages is that they are much shorter than conventional text documents. These messages provide inadequate term co occurrence information for capturing semantic associations. To address this problem, we propose a novel framework for organizing unstructured microblogging messages by transforming them to a semantically structured representation. The proposed framework first captures informative tree fragments by analyzing a parse tree of the message, and then exploits external knowledge bases (Wikipedia and WordNet) to enhance their semantic information. Empirical evaluation on a Twitter dataset shows that our framework significantly outperforms existing state-of-the-art methods. 0 0
Enhancing automatic term recognition algorithms with HTML tags processing Lucansky M.
Simko M.
Bielikova M.
ACM International Conference Proceeding Series English 2011 We focus on mining relevant information from web pages. Unlike plain text documents, web pages contain another source of potentially relevant information - easily processable mark-up. We propose an approach to keyword extraction that enhances Automatic Term Recognition (ATR) algorithms intended for processing plain text documents with an analysis of HTML tags present in the document. We distinguish tags that have a semantic potential. We present results of an experiment we conducted on a set of Wikipedia pages. It shows that enhancement yields better results than using ATR algorithms alone. 0 0
Enriching and localizing semantic tags in internet videos Ballan L.
Bertini M.
Bimbo A.D.
Serra G.
MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops English 2011 Tagging of multimedia content is becoming more and more widespread as web 2.0 sites, like Flickr and Facebook for images, YouTube and Vimeo for videos, have popularized tagging functionalities among their users. These user-generated tags are used to retrieve multimedia content, and to ease browsing and exploration of media collections, e.g. using tag clouds. However, not all media are equally tagged by users: using the current browsers is easy to tag a single photo, and even tagging a part of a photo, like a face, has become common in sites like Flickr and Facebook; on the other hand tagging a video sequence is more complicated and time consuming, so that users just tag the overall content of a video. In this paper we present a system for automatic video annotation that increases the number of tags originally provided by users, and localizes them temporally, associating tags to shots. This approach exploits collective knowledge embedded in tags and Wikipedia, and visual similarity of key frames and images uploaded to social sites like YouTube and Flickr. Copyright 2011 ACM. 0 0
Evaluating reranking methods using wikipedia features Kurakado K.
Oishi T.
Hasegawa R.
Fujita H.
Koshimura M.
ICAART 2011 - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence English 2011 Many people these days access a vast document on the Web very often with the help of search engines such as Google. However, even if we use the search engine, it is often the case that we cannot find desired information easily. In this paper, we extract related words for the search query by analyzing link information and category structure. we aim to assist the user in retrieving web pages by reranking search results. 0 0
Evaluating the trade-offs between diversity and precision for Web image search using concept-based query expansion Hoque E.
Hoeber O.
Gong M.
Proceedings - 2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2011 English 2011 Even though Web image search queries are often ambiguous, traditional search engines retrieve and present results solely based on relevance ranking, where only the most common and popular interpretations of the query are considered. Rather than assuming that all users are interested in the most common meaning of the query, a more sensible approach may be to produce a diversified set of images that cover the various aspects of the query, under the expectation that at least one of these interpretations will match the searcher's needs. However, such a promotion of diversity in the search results has the side-effect of decreasing the precision of the most common sense. In this paper, we evaluate this trade-off in the context of a method for explicitly diversifying image search results via concept-based query expansion using Wikipedia. Experiments with controlling the degree of diversification illustrate this balance between diversity and precision for both ambiguous and specific queries. Our ultimate goal of this research is to propose an automatic method for tuning the diversification parameter based on degree of ambiguity of the original query. 0 0
Exploiting arabic wikipedia for automatic ontology generation: A proposed approach Al-Rajebah N.I.
Al-Khalifa H.S.
Al-Salman A.S.
2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011 English 2011 Ontological models play an important role in the Semantic Web. Despite being widely spread, there are a few known attempts to build ontologies for the Arabic language. As a result, a lack of Arabic Semantic Web applications is encountered. In this paper, we propose an approach to build ontologies automatically for the Arabic language from Wikipedia. Our approach relies on the semantic field theory such that any Wikipedian article is analyzed to extract semantic relations using its infobox and the list of categories. We will also present our system architecture along with an initial evaluation to evaluate the effectiveness and correctness of the resultant ontological model. 0 0
Extracting events from Wikipedia as RDF triples linked to widespread semantic web datasets Carlo Aliprandi
Francesco Ronzano
Andrea Marchetti
Maurizio Tesconi
Salvatore Minutoli
Lecture Notes in Computer Science English 2011 Many attempts have been made to extract structured data from Web resources, exposing them as RDF triples and interlinking them with other RDF datasets: in this way it is possible to create clouds of highly integrated Semantic Web data collections. In this paper we describe an approach to enhance the extraction of semantic contents from unstructured textual documents, in particular considering Wikipedia articles and focusing on event mining. Starting from the deep parsing of a set of English Wikipedia articles, we produce a semantic annotation compliant with the Knowledge Annotation Format (KAF). We extract events from the KAF semantic annotation and then we structure each event as a set of RDF triples linked to both DBpedia and WordNet. We point out examples of automatically mined events, providing some general evaluation of how our approach may discover new events and link them to existing contents. 0 0
Extracting information about security vulnerabilities from Web text Mulwad V.
Li W.
Joshi A.
Tim Finin
Viswanathan K.
Proceedings - 2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2011 English 2011 The Web is an important source of information about computer security threats, vulnerabilities and cyberattacks. We present initial work on developing a framework to detect and extract information about vulnerabilities and attacks from Web text. Our prototype system uses Wikitology, a general purpose knowledge base derived from Wikipedia, to extract concepts that describe specific vulnerabilities and attacks, map them to related concepts from DBpedia and generate machine understandable assertions. Such a framework will be useful in adding structure to already existing vulnerability descriptions as well as detecting new ones. We evaluate our approach against vulnerability descriptions from the National Vulnerability Database. Our results suggest that it can be useful in monitoring streams of text from social media or chat rooms to identify potential new attacks and vulnerabilities or to collect data on the spread and volume of existing ones. 0 0
Extracting the multilingual terminology from a web-based encyclopedia Fatiha S. Proceedings - International Conference on Research Challenges in Information Science English 2011 Multilingual linguistic resources are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. This article seeks to explore and to exploit the idea of using multilingual web-based encyclopedias such as Wikipedia as comparable corpora for bilingual terminology extraction. We propose an approach to extract terms and their translations from different types of Wikipedia link information and data. The next step will be using a linguistic-based information to re-rank and filter the extracted term candidates in the target language. Preliminary evaluations using the combined statistics-based and linguistic-based approaches were applied on different pairs of languages including Japanese, French and English. These evaluations showed a real open improvement and a good quality of the extracted term candidates for building or enriching multilingual ontologies, dictionaries or feeding a cross-language information retrieval system with the related expansion terms of the source query. 0 0
Faster temporal range queries over versioned text He J.
Suel T.
SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2011 Versioned textual collections are collections that retain multiple versions of a document as it evolves over time. Important large-scale examples are Wikipedia and the web collection of the Internet Archive. Search queries over such collections often use keywords as well as temporal constraints, most commonly a time range of interest. In this paper, we study how to support such temporal range queries over versioned text. Our goal is to process these queries faster than the corresponding keyword-only queries, by exploiting the additional constraint. A simple approach might partition the index into different time ranges, and then access only the relevant parts. However, specialized inverted index compression techniques are crucial for large versioned collections, and a naive partitioning can negatively affect index compression and query throughput. We show how to achieve high query throughput by using smart index partitioning techniques that take index compression into account. Experiments on over 85 million versions of Wikipedia articles show that queries can be executed in a few milliseconds on memory-based index structures, and only slightly more time on disk-based structures. We also show how to efficiently support the recently proposed stable top-k search primitive on top of our schemes. 0 0
Generation of hypertext for web-based learning based on wikification Lui A.K.-F.
Ng V.S.-C.
Tsang E.K.M.
Ho A.C.H.
Communications in Computer and Information Science English 2011 This paper presents a preliminary study into the conversion of plain text documents into hypertext for web-based learning. The novelty of this approach is the generation of two types of hyperlinks: links to Wikipedia article for exploratory learning, and self-referencing links for elaboration and references. Hyperlink generation is based on two rounds of wikification. The first round wikifies a set of source documents so that the wikified source documents can be semantically compared to Wikipedia articles using existing link-based measure techniques. The second round of wikification then evaluates each hyperlink in the wikified source documents and checks if there is a semantically related source document for replacing the current target Wikipedia article. While preliminary evaluation of a prototype implementation seemed feasible, relatively few self-referencing links could be generated using a test set of course text. 0 0
Geodesic distances for web document clustering Tekir S.
Mansmann F.
Keim D.
IEEE SSCI 2011: Symposium Series on Computational Intelligence - CIDM 2011: 2011 IEEE Symposium on Computational Intelligence and Data Mining English 2011 While traditional distance measures are often capable of properly describing similarity between objects, in some application areas there is still potential to fine-tune these measures with additional information provided in the data sets. In this work we combine such traditional distance measures for document analysis with link information between documents to improve clustering results. In particular, we test the effectiveness of geodesic distances as similarity measures under the space assumption of spherical geometry in a 0-sphere. Our proposed distance measure is thus a combination of the cosine distance of the term-document matrix and some curvature values in the geodesic distance formula. To estimate these curvature values, we calculate clustering coefficient values for every document from the link graph of the data set and increase their distinctiveness by means of a heuristic as these clustering coefficient values are rough estimates of the curvatures. To evaluate our work, we perform clustering tests with the k-means algorithm on the English Wikipedia hyperlinked data set with both traditional cosine distance and our proposed geodesic distance. The effectiveness of our approach is measured by computing micro-precision values of the clusters based on the provided categorical information of each article. 0 0
Graph-based named entity linking with Wikipedia Ben Hachey
Will Radford
Curran J.R.
Lecture Notes in Computer Science English 2011 Named entity linking (NEL) grounds entity mentions to their corresponding Wikipedia article. State-of-the-art supervised NEL systems use features over the rich Wikipedia document and link-graph structure. Graph-based measures have been effective over WordNet for word sense disambiguation (wsd). We draw parallels between NEL and (wsd), motivating our unsupervised NEL approach that exploits the Wikipedia article and category link graphs. Our system achieves 85.5% accuracy on the TAC 2010 shared task - competitive with the best supervised and unsupervised systems. 0 0
Greedy and randomized feature selection for web search ranking Pan F.
Converse T.
Ahn D.
Salvetti F.
Donato G.
Proceedings - 11th IEEE International Conference on Computer and Information Technology, CIT 2011 English 2011 Modern search engines have to be fast to satisfy users, so there are hard back-end latency requirements. The set of features useful for search ranking functions, though, continues to grow, making feature computation a latency bottleneck. As a result, not all available features can be used for ranking, and in fact, much of the time only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. To this end, we explore different feature selection methods using boosted regression trees, including both greedy approaches (i.e., selecting the features with the highest relative influence as computed by boosted trees; discounting importance by feature similarity) and randomized approaches (i.e., best-only genetic algorithm; a proposed more efficient randomized method with feature-importance-based backward elimination). We evaluate and compare these approaches using two data sets, one from a commercial Wikipedia search engine and the other from a commercial Web search engine. The experimental results show that the greedy approach that selects top features with the highest relative influence performs close to the full-feature model, and the randomized feature selection with feature-importance-based backward elimination outperforms all other randomized and greedy approaches, especially on the Wikipedia data. 0 0
Harvesting Wikipedia knowledge to identify topics in ongoing natural language dialogs Alexa Breuing
Ulli Waltinger
Ipke Wachsmuth
Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 English 2011 This paper introduces a model harvesting the crowdsourced encyclopedic knowledge provided by Wikipedia to improve the conversational abilities of an artificial agent. More precisely, we present a model for automatic topic identification in ongoing natural language dialogs. On the basis of a graphbased representation of theWikipedia category system, our model implements six tasks essential for detecting the topical overlap of coherent dialog contributions. Thereby the identification process operates online to handle dialog streams of constantly changing topical threads in real-time. The realization of the model and its application to our conversational agent aims to improve humanagent conversations by transferring human-like topic awareness to the artificial interlocutor. 0 0
Hybrid Wikis: Empowering users tocollaboratively structure information Florian Matthes
Christian Neubert
Steinhoff A.
ICSOFT 2011 - Proceedings of the 6th International Conference on Software and Database Technologies English 2011 Wikis are increasingly used for collaborative enterprise information management since they are flexibly applicable and encourage the contribution of knowledge. The fact that ordinary wiki pages contain pure text only limits how the information can be processed or made accessible to users. Semantic wikis promise to solve this problem by capturing knowledge in structured form and offering advanced querying capabilites. However, it is not obvious for business users, how they can benefit from providing semantic annotations which are not familiar to them and often difficult to enter. In this paper, we first introduce the concepts of hybrid wikis, namely attributes, type tags, attribute suggestions, and attribute definitions with integrity constraints. Business users interact with these concepts using a familiar user interface based on forms, spreadsheet-like tables, and auto-completion for links and values. We then illustrate these concepts using an example scenario with projects and persons and highlight key implementation aspects of a Java-based hybrid wiki system (Tricia). The paper ends with the description of practical experiences gained in two usage scenarios, a comparison with related work and an outlook on future work. 0 0
Hybrid and interactive domain-specific translation for multilingual access to digital libraries Jones G.J.F.
Fuller M.
Newman E.
YanChun Zhang
Lecture Notes in Computer Science English 2011 Accurate high-coverage translation is a vital component of reliable cross language information retrieval (CLIR) systems. This is particularly true for retrieval from archives such as Digital Libraries which are often specific to certain domains. While general machine translation (MT) has been shown to be effective for CLIR tasks in laboratory information retrieval evaluation tasks, it is generally not well suited to specialized situations where domain-specific translations are required. We demonstrate that effective query translation in the domain of cultural heritage (CH) can be achieved using a hybrid translation method which augments a standard MT system with domain-specific phrase dictionaries automatically mined from Wikipedia . We further describe the use of these components in a domain-specific interactive query translation service. The interactive system selects the hybrid translation by default, with other possible translations being offered to the user interactively to enable them to select alternative or additional translation(s). The objective of this interactive service is to provide user control of translation while maximising translation accuracy and minimizing the translation effort of the user. Experiments using our hybrid translation system with sample query logs from users of CH websites demonstrate a large improvement in the accuracy of domain-specific phrase detection and translation. 0 0
Identifying aspects for web-search queries Fei Wu
Madhavan J.
Halevy A.
Journal of Artificial Intelligence Research English 2011 Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effectively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the Aspector system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search query. To serve as an effective means to explore the space, Aspector computes aspects that are orthogonal to each other and to have high combined coverage. Aspector combines two sources of information to compute aspects. We discover candidate aspects by analyzing query logs, and cluster them to eliminate redundancies. We then use a mass-collaboration knowledge base (e.g., Wikipedia) to compute candidate aspects for queries that occur less frequently and to group together aspects that are likely to be "semantically" related. We present a user study that indicates that the aspects we compute are rated favorably against three competing alternatives - related searches proposed by Google, cluster labels assigned by the Clusty search engine, and navigational searches proposed by Bing. © 2011 AI Access Foundation. All rights reserved. 0 0
Improving accessibility to mathematical formulas: The Wikipedia math accessor Ferres L.
Sepulveda J.F.
W4A 2011 - International Cross-Disciplinary Conference on Web Accessibility English 2011 Mathematics accessibility is an important topic for inclusive education. We tackle the problem of accessing a large repository of mathematical formulas, by providing a natural language description of the more than 350,000 Wikipedia formulas using a well-researched sub-language targetting Spanish speakers, for whom assistive technologies, particularly domain-specific technologies like the one described here, are scarce. Copyright 2011 ACM. 0 0
Informative sentence retrieval for domain specific terminologies Koh J.-L.
Cho C.-W.
Lecture Notes in Computer Science English 2011 Domain specific terminologies represent important concepts when students study a subject. If the sentences which describe important concepts related to a terminology can be accessed easily, students will understand the semantics represented in the sentences which contain the terminology in depth. In this paper, an effective sentence retrieval system is provided to search informative sentences of a domain-specific terminology from the electrical books. A term weighting model is constructed in the proposed system by using web resources, including Wikipedia and FOLDOC, to measure the degree of a word relative to the query terminology. Then the relevance score of a sentence is estimated by summing the weights of the words in the sentence, which is used to rank the candidate answer sentences. By adopting the proposed method, the obtained answer sentences are not limited to certain sentence patterns. The results of experiment show that the ranked list of answer sentences retrieved by our proposed system have higher NDCG values than the typical IR approach and pattern-matching based approach. 0 0
Integrating artificial intelligence solutions into interfaces of online knowledge production Heder M. ICIC Express Letters English 2011 The current interfaces of online knowledge production systems are not optimal for the creation of high-quality knowledge units. This study investigates possible methods for the integration of AI solutions into those web interfaces where users produce knowledge, e.g., Wikipedia, forums and blogs. A requirement survey was conducted in order to predict which solutions the users would most likely accept out of the many possible choices. We focused on the reading and editing preferences of Wikipedia users, Wikipedia being the biggest knowledge production and sharing framework. We found that many functions can be easily implemented into the knowledge production interface if we simply integrate well-known and available AI solutions. The results of our survey show that right now the need for basic, but well-implemented and integrated AI functions is greater than the need for cutting-edge, complex AI modules. It can be concluded that even if it is advisable to constantly improve the underlying algorithms and methods of the system, much more emphasis should be given to the interface design of currently available AI solutions. 0 0
Integrating visual classifier ensemble with term extraction for Automatic Image Annotation Lei Y.
Wong W.
Bennamoun M.
Wei Liu
Proceedings of the 2011 6th IEEE Conference on Industrial Electronics and Applications, ICIEA 2011 English 2011 Existing Automatic Image Annotation (AIA) systems are typically developed, trained and tested using high quality, manually labelled images. The tremendous manual efforts required with an untested ability to scale and tolerate noise all have an impact on existing systems' applicability to real-world data. In this paper, we propose a novel AIA system which harnesses the collective intelligence on the Web to automatically construct training data to work with an ensemble of Support Vector Machine (SVM) classifiers based on Multi-Instance Learning (MIL) and global features. An evaluation of the proposed annotation approach using an automatically constructed training set from Wikipedia demonstrates a slight improvement of in annotation accuracy in comparison with two existing systems. 0 0
Knowledge sharing via Web 2.0 for diverse student groups in distance learning Voychenko O.
Synytsya K.
2011 IEEE Global Engineering Education Conference, EDUCON 2011 English 2011 A learning environment based on the integration of the LMS and MediaWiki components is suggested to support knowledge sharing among generations of master course students. A case study demonstrates significant raise of motivation and learning efficiency of students with sufficient professional experience as well as communication skills enhancement for all students. 0 0
LIA at INEX 2010 book track Deveaud R.
Boudin F.
Bellot P.
Lecture Notes in Computer Science English 2011 In this paper we describe our participation and present our contributions in the INEX 2010 Book Track. Digitized books are now a common source of information on the Web, however OCR sometimes introduces errors that can penalize Information Retrieval. We propose a method for correcting hyphenations in the books and we analyse its impact on the Best Books for Reference task. The observed improvement is around 1%. This year we also experimented different query expansion techniques. The first one consists of selecting informative words from a Wikipedia page related to the topic. The second one uses a dependency parser to enrich the query with the detected phrases using a Markov Random Field model. We show that there is a significant improvement over the state-of-the-art when using a large weighted list of Wikipedia words, meanwhile hyphenation correction has an impact on their distribution over the book corpus. 0 0
Leadership and success factors in online creative collaboration Kurt Luther
Amy Bruckman
IEEE Potentials English 2011 Social computing systems have enabled new and wildly successful forms of creative collaboration to take place. Two of the best-known examples are Wikipedia and the open-source software (OSS) movement. Wikipedia, the free online encyclopedia, boasts millions of articles (over 3.6 million just in English) written by thousands of volunteers collaborating via the Internet. The OSS movement, also fueled mainly by volunteer online collaboration, has produced some of the worlds most powerful and important software applications, including the Apache HTTP Server, the Linux operating system, and the Mozilla Firefox Web browser. 0 0
Lessons from the classroom: Successful techniques for teaching wikis using Wikipedia Frank Schulenburg
LiAnna Davis
Max Klein
WikiSym 2011 Conference Proceedings - 7th Annual International Symposium on Wikis and Open Collaboration English 2011 In the fall of 2010, the Wikimedia Foundation partnered with faculty from several top universities to introduce wiki techonology and Wikipedia into class assignments of public policy related subjects. Through assignments based in Wikipedia students improved skills in collaboration, critical thinking, expository writing, media literacy, and technology fluency. In video interviews, students describe their experience and the learning objectives emerged through the Wikipedia assignment. Many students also commented on the satisfaction in producing a research document that had value beyond a grade. Professor Max Klein explains the success of his classroom use of a WikiProject page as a springboard for class discussion and homework assignments. Workshop participants experience some of the Wikipedia training modules through activities. This interactive workshop discloses some successes and failures of the Initiative and details specifically what makes a successful Wikipediaediting assignment. 0 0
Leveraging community-built knowledge for type coercion in question answering Kalyanpur A.
Murdock J.W.
Fan J.
Welty C.
Lecture Notes in Computer Science English 2011 Watson, the winner of the Jeopardy! challenge, is a state-of-the-art open-domain Question Answering system that tackles the fundamental issue of answer typing by using a novel type coercion (TyCor) framework, where candidate answers are initially produced without considering type information, and subsequent stages check whether the candidate can be coerced into the expected answer type. In this paper, we provide a high-level overview of the TyCor framework and discuss how it is integrated in Watson, focusing on and evaluating three TyCor components that leverage the community built semi-structured and structured knowledge resources - DBpedia (in conjunction with the YAGO ontology), Wikipedia Categories and Lists. These resources complement each other well in terms of precision and granularity of type information, and through links to Wikipedia, provide coverage for a large set of instances. 0 0
Mail2Wiki: Low-cost sharing and early curation from email to wikis Ben Hanrahan
Guillaume Bouchard
Gregorio Convertino
Thiebaud Weksteen
Nicholas Kong
Cedric Archambeau
Chi E.H.
C and T 2011 - 5th International Conference on Communities and Technologies, Conference Proceedings English 2011 In this design paper we motivate and describe the Mail2Wiki system, which enables low-cost sharing and early curation from email to wikis by knowledge workers. We aim to aid adoption of enterprise wikis and enable more efficient knowledge sharing and reuse. We present a design rationale grounded in prior empirical work, the design of the system, and the evaluation of the user interface. The system includes two alternative front-ends to enable incremental adoption by workers who are currently using email to share with their communities. 0 0
Mail2Wiki: Posting and curating wiki content from email Ben Hanrahan
Thiebaud Weksteen
Nicholas Kong
Gregorio Convertino
Guillaume Bouchard
Cedric Archambeau
Chi E.H.
International Conference on Intelligent User Interfaces, Proceedings IUI English 2011 Enterprise wikis commonly see low adoption rates, preventing them from reaching the critical mass that is needed to make them valuable. The high interaction costs for contributing content to these wikis is a key factor impeding wiki adoption. Much of the collaboration among knowledge workers continues to occur in email, which causes useful information to stay siloed in personal inboxes. In this demo we present Mail2Wiki, a system that enables easy contribution and initial curation of content from the personal space of email to the shared repository of a wiki. 0 0
Managing Web content using Linked Data principles - Combining semantic structure with dynamic content syndication Heino N.
Tramp S.
Sören Auer
Proceedings - International Computer Software and Applications Conference English 2011 Despite the success of the emerging Linked Data Web, offering content in a machine-processable way and - at the same time - as a traditional Web site is still not a trivial task. In this paper, we present the OntoWiki-CMS - an extension to the collaborative knowledge engineering toolkit OntoWiki for managing semantically enriched Web content. OntoWiki-CMS is based on OntoWiki for the collaborative authoring of semantically enriched Web content, vocabularies and taxonomies for the semantic structuring of the Web content and the OntoWiki Site Extension, a template and dynamic syndication system for representing the semantically enriched content as a Web site and the dynamic integration of supplementary content. OntoWiki-CMS facilitates and integrates existing content-specific content management strategies (such as blogs, bibliographic repositories or social networks). OntoWiki-CMS helps to balance between the creation of rich, stable semantic structures and the participatory involvement of a potentially large editor and contributor community. As a result semantic structuring of the Web content facilitates better search, browsing and exploration as we demonstrate with a use case. 0 0
Managing multimodal and multilingual semantic content Marcel Martin
Gerber D.
Heino N.
Sören Auer
Ermilov T.
WEBIST 2011 - Proceedings of the 7th International Conference on Web Information Systems and Technologies English 2011 With the advent and increasing popularity of Semantic Wikis and the Linked Data the management of se-mantically represented knowledge became mainstream. However, certain categories of semantically enriched content, such as multimodal documents as well as multilingual textual resources are still difficult to handle. In this paper, we present a comprehensive strategy for managing the life-cycle of both multimodal and multilingual semantically enriched content. The strategy is based on extending a number of semantic knowledge management techniques such as authoring, versioning, evolution, access and exploration for semantically enriched multimodal and multilingual content. We showcase an implementation and user interface based on the semantic wiki paradigm and present a use case from the e-tourism domain. 0 0
Metadata enrichment via topic models for author name disambiguation Bernardi R.
Le D.-T.
Lecture Notes in Computer Science English 2011 This paper tackles the well known problem of Author Name Disambiguation (AND) in Digital Libraries (DL). Following [14,13], we assume that an individual tends to create a distinctively coherent body of work that can hence form a single cluster containing all of his/her articles yet distinguishing them from those of everyone else with the same name. Still, we believe the information contained in a DL may be not sufficient to allow an automatic detection of such clusters; this lack of information becomes even more evident in federated digital libraries, where the labels assigned by librarians may belong to different controlled vocabularies or different classification systems, and in digital libraries on the web where records may be not assigned neither subject headings nor classification numbers. Hence, we exploit Topic Models, extracted from Wikipedia, to enhance records metadata and use Agglomerative Clustering to disambiguate ambiguous author names by clustering together similar records; records in different clusters are supposed to have been written by different people. We investigate the following two research questions: (a) are the Classification Systems and Subject Heading labels manually assigned by librarians general and informative enough to disambiguate Author Names via clustering techniques? (b) Do Topic Models induce from large corpora the conceptual information necessary for labelling automatically DL metadata and grasp topic similarities of the records? To answer these questions, we will use the Library Catalogue of the Bolzano University Library as case study. 0 0
Mining for reengineering: An application to semantic wikis using formal and relational concept analysis Shi L.
Toussaint Y.
Napoli A.
Blansche A.
Lecture Notes in Computer Science English 2011 Semantic wikis enable collaboration between human agents for creating knowledge systems. In this way, data embedded in semantic wikis can be mined and the resulting knowledge patterns can be reused to extend and improve the structure of wikis. This paper proposes a method for guiding the reengineering and improving the structure of a semantic wiki. This method suggests the creation of categories and relations between categories using Formal Concept Analysis (FCA) and Relational Concept Analysis (RCA). FCA allows the design of a concept lattice while RCA provides relational attributes completing the content of formal concepts. The originality of the approach is to consider the wiki content from FCA and RCA points of view and to extract knowledge units from this content allowing a factorization and a reengineering of the wiki structure. This method is general and does not depend on any domain and can be generalized to every kind of semantic wiki. Examples are studied throughout the paper and experiments show the substantial results. 0 0
Mining fuzzy domain ontology based on concept vector from Wikipedia Category Network Lu C.-Y.
Ho S.-W.
Chung J.-M.
Hsu F.-Y.
Lee H.-M.
Ho J.-M.
Proceedings - 2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2011 English 2011 Ontology is essential in the formalization of domain knowledge for effective human-computer interactions (i.e., expert-finding). Many researchers have proposed approaches to measure the similarity between concepts by accessing fuzzy domain ontology. However, engineering of the construction of domain ontologies turns out to be labor intensive and tedious. In this paper, we propose an approach to mine domain concepts from Wikipedia Category Network, and to generate the fuzzy relation based on a concept vector extraction method to measure the relatedness between a single term and a concept. Our methodology can conceptualize domain knowledge by mining Wikipedia Category Network. An empirical experiment is conducted to evaluate the robustness by using TREC dataset. Experiment results show the constructed fuzzy domain ontology derived by proposed approach can discover robust fuzzy domain ontology with satisfactory accuracy in information retrieval tasks. 0 0
Modeling medical interventions using the semantic MediaWiki for use in healthcare practice and education Kontotasiou D.
Bratsas C.
Bamidis P.D.
Proceedings - IEEE Symposium on Computer-Based Medical Systems English 2011 Social Software and particularly semantic wikis have been increasingly adopted by many online health-related professional and educational services. Because of their ease of use and rapidity of deployment, they offer the opportunity for powerful information sharing and ease of collaboration. Semantic wikis are Web sites that can be edited by anyone who has access to them. However, within medical intervention domain, certain important fundamental issues around development and evaluation have yet to be resolved. Thus, this paper proposes a Wikipedia-like Web-based tool to be used for describing and classifying medical interventions in order to plan and document patient care at a distance. Finally, this paper provides an overview of the ontology to be taken into account for the support of the Web-based tool. 0 0
Modelling provenance of DBpedia resources using Wikipedia contributions Fabrizio Orlandi
Alexandre Passant
Journal of Web Semantics English 2011 DBpedia is one of the largest datasets in the linked Open Data cloud. Its centrality and its cross-domain nature makes it one of the most important and most referred to knowledge bases on the Web of Data, generally used as a reference for data interlinking. Yet, in spite of its authoritative aspect, there is no work so far tackling the provenance aspect of DBpedia statements. By being extracted from Wikipedia, an open and collaborative encyclopedia, delivering provenance information about it would help to ensure trustworthiness of its data, a major need for people using DBpedia data for building applications. To overcome this problem, we propose an approach for modelling and managing provenance on DBpedia using Wikipedia edits, and making this information available on the Web of Data. In this paper, we describe the framework that we implemented to do so, consisting in (1) a lightweight modelling solution to semantically represent provenance of both DBpedia resources and Wikipedia content, along with mappings to popular ontologies such as the W7 - what, when, where, how, who, which, and why - and OPM - open provenance model - models, (2) an information extraction process and a provenance-computation system combining Wikipedia articles' history with DBpedia information, (3) a set of scripts to make provenance information about DBpedia statements directly available when browsing this source, as well as being publicly exposed in RDF for letting software agents consume it. © 2011 Elsevier B.V. 0 0
Multipedia: Enriching DBpedia with multimedia information Garcia-Silva A.
Max Jakob
Mendes P.N.
Christian Bizer
KCAP 2011 - Proceedings of the 2011 Knowledge Capture Conference English 2011 Enriching knowledge bases with multimedia information makes it possible to complement textual descriptions with visual and audio information. Such complementary information can help users to understand the meaning of assertions, and in general improve the user experience with the knowledge base. In this paper we address the problem of how to enrich ontology instances with candidate images retrieved from existing Web search engines. DBpedia has evolved into a major hub in the Linked Data cloud, interconnecting millions of entities organized under a consistent ontology. Our approach taps into the Wikipedia corpus to gather context information for DBpedia instances and takes advantage of image tagging information when this is available to calculate semantic relatedness between instances and candidate images. We performed experiments with focus on the particularly challenging problem of highly ambiguous names. Both methods presented in this work outperformed the baseline. Our best method leveraged context words from Wikipedia, tags from Flickr and type information from DBpedia to achieve an average precision of 80%. 0 0
Ontology-based feature extraction Vicient C.
Sanchez D.
Moreno A.
Proceedings - 2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2011 English 2011 Knowledge-based data mining and classification algorithms require of systems that are able to extract textual attributes contained in raw text documents, and map them to structured knowledge sources (e.g. ontologies) so that they can be semantically analyzed. The system presented in this paper performs this tasks in an automatic way, relying on a predefined ontology which states the concepts in this the posterior data analysis will be focused. As features, our system focuses on extracting relevant Named Entities from textual resources describing a particular entity. Those are evaluated by means of linguistic and Web-based co-occurrence analyses to map them to ontological concepts, thereby discovering relevant features of the object. The system has been preliminary tested with tourist destinations and Wikipedia textual resources, showing promising results. 0 0
Probabilistic quality assessment based on article's revision history Jangwhan Han
Chao Wang
Jiang D.
Lecture Notes in Computer Science English 2011 The collaborative efforts of users in social media services such as Wikipedia have led to an explosion in user-generated content and how to automatically tag the quality of the content is an eminent concern now. Actually each article is usually undergoing a series of revision phases and the articles of different quality classes exhibit specific revision cycle patterns. We propose to Assess Quality based on Revision History (AQRH) for a specific domain as follows. First, we borrow Hidden Markov Model (HMM) to turn each article's revision history into a revision state sequence. Then, for each quality class its revision cycle patterns are extracted and are clustered into quality corpora. Finally, article's quality is thereby gauged by comparing the article's state sequence with the patterns of pre-classified documents in probabilistic sense. We conduct experiments on a set of Wikipedia articles and the results demonstrate that our method can accurately and objectively capture web article's quality. 0 0
Probabilistic quality assessment of articles based on learning editing patterns Jangwhan Han
Chao Wang
Fu X.
Chen K.
2011 International Conference on Computer Science and Service System, CSSS 2011 - Proceedings English 2011 As a new model of distributed, collaborative information source, such as Wikipedia, is emerging, its content is constantly being generated, updated and maintained by various users and its data quality varies from time to time. Thus the quality assessment of the content is a pressing concern now. We observe that each article usually goes through a series of editing phases such as building structure, contributing text, discussing text, etc., gradually getting into the final quality state and that the articles of different quality classes exhibit specific edit cycle patterns. We propose a new approach to Assess Quality based on article's Editing History (AQEH) for a specific domain as follows. First, each article's editing history is transformed into a state sequence borrowing HiddenMarkov Model(HMM). Second, edit cycle patterns are first extracted for each quality class and then each quality class is further refined into quality corpora by clustering. Now, each quality class is clearly represented by a series of quality corpora and each quality corpus is described by a group of frequently co-occurring edit cycle patterns. Finally, article quality can be determined in probabilistic sense by comparing the article with the quality corpora. Experimental results demonstrate that our method can capture and predict web article's quality accurately and objectively. 0 0
Quality assessment of Wikipedia external links Tzekou P.
Stamou S.
Kirtsis N.
Zotos N.
WEBIST 2011 - Proceedings of the 7th International Conference on Web Information Systems and Technologies English 2011 Wikipedia is a unique source of information that has been collectively supplied by thousands of people. Since its nascence in 2001, Wikipedia is continuously evolving and like most websites it is interconnected via hyperlinks to other web information sources. Wikipedia articles contain two types of links: internal and external. Internal links point to other Wikipedia articles, while external links point outside Wikipedia and normally they are not used in the body of the article. Although there exist specific guidelines about both the style and the purpose of the article external links, no approach has been recorded that tries to capture in a systematic manner the quality of Wikipedia external links. In this paper, we study the quality of Wikipedia external links by assessing the degree to which these conform to their intended purpose; that is to formulate a comprehensive list of accurate information sources about the article contents. For our study, we estimate the decay of Wikipedia external links and we investigate their distribution in the Wikipedia articles. Our measurements give perceptible evidence for the value of external links and may imply their corresponding articles' quality in a holistic Wikipedia evaluation. 0 0
Query and tag translation for Chinese-Korean cross-language social media retrieval Wang Y.-C.
Chen J.-T.
Tsai R.T.-H.
Hsu W.-L.
Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011 English 2011 Collaborative tagging has been widely adopted by social media websites to allow users to describe content with metadata tags. Tagging can greatly improve search results. We propose a cross-language social media retrieval system (CLSMR) to help users retrieve foreign-language tagged media content. We construct a Chinese to Korean CLSMR system that translates Chinese queries into Korean, retrieves content, and then translates the Korean tags in the search results back into Chinese. Our system translates NEs using a dictionary of bilingual NE pairs from Wikipedia and a pattern-based software translator which learns regular NE patterns from the web. The top-10 precision of YouTube retrieved results for our system was 0.39875. The K-C NE tag translation accuracy for the top-10 YouTube results was 77.6%, which shows that our translation method is fairly effective for named entities. A questionnaire given to users showed that automatically translated tags were considered as informative as a human-written summary. With our proposed CLSMR system, Chinese users can retrieve online Korean media files and get a basic understanding of their content with no knowledge of the Korean language. 0 0
Query relaxation for entity-relationship search Elbassuoni S.
Maya Ramanath
Gerhard Weikum
Lecture Notes in Computer Science English 2011 Entity-relationship-structured data is becoming more important on the Web. For example, large knowledge bases have been automatically constructed by information extraction from Wikipedia and other Web sources. Entities and relationships can be represented by subject-property-object triples in the RDF model, and can then be precisely searched by structured query languages like SPARQL. Because of their Boolean-match semantics, such queries often return too few or even no results. To improve recall, it is thus desirable to support users by automatically relaxing or reformulating queries in such a way that the intention of the original user query is preserved while returning a sufficient number of ranked results. In this paper we describe comprehensive methods to relax SPARQL-like triple-pattern queries in a fully automated manner. Our framework produces a set of relaxations by means of statistical language models for structured RDF data and queries. The query processing algorithms merge the results of different relaxations into a unified result list, with ranking based on any ranking function for structured queries over RDF-data. Our experimental evaluation, with two different datasets about movies and books, shows the effectiveness of the automatically generated relaxations and the improved quality of query results based on assessments collected on the Amazon Mechanical Turk platform. 0 0
Quick detection of top-k personalized PageRank lists Avrachenkov K.
Litvak N.
Nemirovsky D.
Smirnova E.
Sokol M.
Lecture Notes in Computer Science English 2011 We study a problem of quick detection of top-k Personalized PageRank (PPR) lists. This problem has a number of important applications such as finding local cuts in large graphs, estimation of similarity distance and person name disambiguation. We argue that two observations are important when finding top-k PPR lists. Firstly, it is crucial that we detect fast the top-k most important neighbors of a node, while the exact order in the top-k list and the exact values of PPR are by far not so crucial. Secondly, by allowing a small number of "wrong" elements in top-k lists, we achieve great computational savings, in fact, without degrading the quality of the results. Based on these ideas, we propose Monte Carlo methods for quick detection of top-k PPR lists. We demonstrate the effectiveness of these methods on the Web and Wikipedia graphs, provide performance evaluation and supply stopping criteria. 0 0
RETRACTED ARTICLE: Applications in "fundamentals of computers" course based on web2.0 computer-supported collaborative teaching Mingyu Z.
Qi W.
Ying Z.
Proceedings - PACCS 2011: 2011 3rd Pacific-Asia Conference on Circuits, Communications and System English 2011 Taking "Fundamentals of Computers" public courses as example and starting from the perspective of cooperative learning this paper proposes an idea of computer-supported collaborative learning which is based on Web 2.0 and using blog wiki as teaching method's. With the accomplishment of the design "Platform of CSCL in Fundamentals of Computer's course" this study provides a valuable reference for relative method of implementation. 0 0
Relational similarity measure: An approach combining Wikipedia and wordnet Cao Y.J.
Lu Z.
Cai S.M.
Applied Mechanics and Materials English 2011 Relational similarities between two pairs of words are the degrees of their semantic relations. Vector Space Model (VSM) is used to measure the relational similarity between two pairs of words, however it needs create patterns manually and these patterns are limited. Recently, Latent Relational Analysis (LRA) is proposed and achieves state-of-art results. However, it is time-consuming and cannot express implicit semantic relations. In this study, we propose a new approach to measure relational similarities between two pairs of words by combining Wordnet3.0 and the Web-Wikipedia, thus implicit semantic relations from the very large corpus can be mined. The proposed approach mainly possesses two characters: (1) A new method is proposed in the pattern extraction step, which considers various part-of-speeches of words. (2) Wordnet3.0 is applied to calculate the semantic relatedness between a pair of words so that the implicit semantic relation of the two words can be expressed. Experimental evaluation based on the 374 SAT multiple-choice word-analogy questions, the precision of the proposed approach is 43.9%, which is lower than that of LRA suggested by Turney in 2005, but the suggested approach mainly focuses on mining the semantic relations among words. 0 0
Retrieving attributes using web tables Kopliku A.
Pinel-Sauvagnat K.
Boughanem M.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries English 2011 In this paper we propose an attribute retrieval approach which extracts and ranks attributes from Web tables. We combine simple heuristics to filter out improbable attributes and we rank attributes based on frequencies and a table match score. Ranking is reinforced with external evidence from Web search, DBPedia and Wikipedia. Our approach can be applied to whatever instance (e.g. Canada) to retrieve its attributes (capital, GDP). It is shown it has a much higher recall than DBPedia and Wikipedia and that it works better than lexico-syntactic rules for the same purpose. 0 0
SMASHUP: Secure mashup for defense transformation and net-centric systems Heileman M.D.
Heileman G.L.
Shaver M.P.
Gilger M.
Jamkhedkar P.A.
Proceedings of SPIE - The International Society for Optical Engineering English 2011 The recent development of mashup technologies now enables users to easily collect, integrate, and display data from a vast array of different information sources available on the Internet. The ability to harness and leverage information in this manner provides a powerful means for discovering links between information, and greatly enhances decisionmaking capabilities. The availability of such services in DoD environments will provide tremendous advantages to the decision-makers engaged in analysis of critical situations, rapid-response, and long-term planning scenarios. However in the absence of mechanisms for managing the usage of resources, any mashup service in a DoD environment also opens up significant security vulnerabilities to insider threat and accidental leakage of confidential information, not to mention other security threats. In this paper we describe the development of a framework that will allow integration via mashups of content from various data sources in a secure manner. The framework is based on mathematical logic where addressable resources have formal usage terms applied to them, and these terms are used to specify and enforce usage policies over the resources. An advantage of this approach is it provides a formal means for securely managing the usage of resources that might exist within multilevel security environments. 0 0
Sentiment analysis of news titles: The role of entities and a new affective lexicon Loureiro D.
Marreiros G.
Neves J.
Lecture Notes in Computer Science English 2011 The growth of content on the web has been followed by increasing interest in opinion mining. This field of research relies on accurate recognition of emotion from textual data. There's been much research in sentiment analysis lately, but it always focuses on the same elements. Sentiment analysis traditionally depends on linguistic corpora, or common sense knowledge bases, to provide extra dimensions of information to the text being analyzed. Previous research hasn't yet explored a fully automatic method to evaluate how events associated to certain entities may impact each individual's sentiment perception. This project presents a method to assign valence ratings to entities, using information from their Wikipedia page, and considering user preferences gathered from the user's Facebook profile. Furthermore, a new affective lexicon is compiled entirely from existing corpora, without any intervention from the coders. 0 0
Shortipedia aggregating and curating Semantic Web data Vrandecic D.
Ratnakar V.
Krotzsch M.
Gil Y.
Journal of Web Semantics English 2011 Shortipedia is a Web-based knowledge repository, that pulls together a growing number of sources in order to provide a comprehensive, diversified view on entities of interest. Contributors to Shortipedia can easily add claims to the knowledge base, provide sources for their claims, and find links to knowledge already available on the Semantic Web. © 2011 Elsevier B.V. All rights reserved. 0 0
Social media driven image retrieval Adrian Popescu
Gregory Grefenstette
Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR'11 English 2011 People often try to find an image using a short query and images are usually indexed using short annotations. Matching the query vocabulary with the indexing vocabulary is a difficult problem when little text is available. Textual user generated content in Web 2.0 platforms contains a wealth of data that can help solve this problem. Here we describe how to use Wikipedia and Flickr content to improve this match. The initial query is launched in Flickr and we create a query model based on co-occurring terms. We also calculate nearby concepts using Wikipedia and use these to expand the query. The final results are obtained by ranking the results for the expanded query using the similarity between their annotation and the Flickr model. Evaluation of these expansion and ranking techniques, over the Image CLEF 2010 Wikipedia Collection containing 237,434 images and their multilingual textual annotations, shows that a consistent improvement compared to state of the art methods. 0 0
Supporting domain experts to construct conceptual ontologies: A holistic approach Denaux R.
Dolbear C.
Hart G.
Dimitrova V.
Cohn A.G.
Journal of Web Semantics English 2011 A recent trend in ontology engineering research aims at encouraging the active participation of domain experts in the ontology creation process. Ontology construction methodologies together with appropriate tools and technologies, such as controlled natural languages, semantic wikis, intelligent user interfaces and social computing, are being proposed to enable the direct input from domain experts and to minimize the dependency on knowledge engineers at every step of ontology development. The time is ripe for consolidating methodological and technological advancements to create intuitive ontology engineering tools which can make Semantic Web technologies usable by a wide variety of people without formal knowledge engineering skills. A novel, holistic approach to facilitate the involvements of domain experts in the ontology authoring process is presented here. It integrates (i) an ontology construction methodology, (ii) the use of a controlled natural language, and (iii) appropriate tool support. The integrated approach is illustrated with the design, implementation and evaluation of ROO - a unique ontology authoring tool which combines intelligent techniques to assist domain experts in constructing ontologies. The benefits and limitations of the proposed approach are analyzed based on user studies with ROO. A broader discussion is provided pointing at issues to be taken into account when assisting the involvement of domain experts in ontology construction. © 2011 Elsevier B.V. 0 0
Temporal knowledge for timely intelligence Gerhard Weikum
Bedathur S.
Ralf Schenkel
Lecture Notes in Business Information Processing English 2011 Knowledge bases about entities and their relationships are a great asset for business intelligence. Major advances in information extraction and the proliferation of knowledge-sharing communities like Wikipedia have enabled ways for the largely automated construction of rich knowledge bases. Such knowledge about entity-oriented facts can greatly improve the output quality and possibly also efficiency of processing business-relevant documents and event logs. This holds for information within the enterprise as well as in Web communities such as blogs. However, no knowledge base will ever be fully complete and real-world knowledge is continuously changing: new facts supersede old facts, knowledge grows in various dimensions, and completely new classes, relation types, or knowledge structures will arise. This leads to a number of difficult research questions regarding temporal knowledge and the life-cycle of knowledge bases. This short paper outlines challenging issues and research opportunities, and provides references to technical literature. 0 0
The FEM wiki project: A conversion of a training resource for field epidemiologists into a collaborative web 2.0 portal Kostkova P.
Szomszor M.
Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering English 2011 While an ever increasing popularity of online wiki platforms, user-tagging tools, blogs, and forums is the core characteristic of the Web 2.0 era, converting an existing high-quality training module into a collaborative online space for an active community of practice (CoP) while preserving its quality approval processes is a challenging task. This is the aim of the ECDC-funded Field Epidemiology Manual (FEM) wiki project, based on training resources organized in 17 chapters developed for the European EPIET epidemiology training programme. This paper describes the challenges, solutions, and development processes behind the FEM wiki portal - an online collaborative Web 2.0 platform taking advantage of the user-generated input while preserving the structure, editorial processes and style of the existing FEM manual. We describe the need for ECDC-recognised content and discuss the editorial roles developed in this European project but applicable to any other training resource converted into an online wiki platform. 0 0
The InfoAlbum image centric information collection Karlsen R.
Jakobsen B.
ACM International Conference Proceeding Series English 2011 This paper presents a prototype of an image centric information album, where the goal is to automatically provide the user with information about i) the object or event depicted in an image, and ii) the surrounding where the image was taken. The system, called InfoAlbum, aims at improving the image viewing experience by presenting supplementary information such as location names, tags, temperature at image capture time, placement on map, geographically nearby images, Wikipedia articles and web pages. The information is automatically collected from various sources on the Internet based on the image metadata gps coordinates, date/time of image capture and a category keyword provided by the user. Collected information is presented to the user, and some is also stored in the EXIF header of the image and can later be used during image retrieval. 0 0
The success of corporate wiki systems: An end user perspective Bhatti Z.A.
Serge Baile
Yasin H.M.
WikiSym 2011 Conference Proceedings - 7th Annual International Symposium on Wikis and Open Collaboration English 2011 With the ever increasing use of Web 2.0 sites on the internet, the use of Web 2.0 based tools is now employed by organizations across the globe. One of the most widely used Web 2.0 tools in organizations is wiki technology, particularly in project management. It is important for organizations to measure the success of their wiki system implementation. With the advent of new technologies in the market and their deployment by the firms, it is necessary to investigate how they can help organizations execute processes in a better way. In this paper we present a theoretical model for the measurement of corporate wikis' success from the end-user's perspective based on the theoretical foundation of DeLone & McLean's IS success model [17]. We extend the model by incorporating contextual factors with respect to wiki technology in a project management task. This study intends to help firms to understand in a better way, how they can use wikis to achieve an efficient, effective and improved end-user performance. This would also be helpful for companies engaged in wiki development business to improve their products keeping in view the perceptions of wiki end-users. 0 0
Timestamp-based result cache invalidation for web search engines Alici S.
Altingovde I.S.
Ozcan R.
Cambazoglu B.B.
Ulusoy O.
SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2011 The result cache is a vital component for efficiency of large-scale web search engines, and maintaining the freshness of cached query results is the current research challenge. As a remedy to this problem, our work proposes a new mechanism to identify queries whose cached results are stale. The basic idea behind our mechanism is to maintain and compare generation time of query results with update times of posting lists and documents to decide on staleness of query results. The proposed technique is evaluated using a Wikipedia document collection with real update information and a real-life query log. We show that our technique has good prediction accuracy, relative to a baseline based on the time-to-live mechanism. Moreover, it is easy to implement and incurs less processing overhead on the system relative to a recently proposed, more sophisticated invalidation mechanism. 0 0
Toward a semantic vocabulary for systems engineering Di Maio P. ACM International Conference Proceeding Series English 2011 The web can be the most efficient medium for sharing knowledge, provided appropriate technological artifacts such as controlled vocabularies and metadata are adopted. In our research we study the degree of such adoption applied to the systems engineering domain. This paper is a work in progress report discussing issues surrounding knowledge extraction and representation, proposing an integrated approach to tackle various challenges associated with the development of a shared vocabulary for the practice. 0 0
Towards effective short text deep classification Xiaohua Sun
Haofen Wang
Yiqin Yu
SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval English 2011 Recently, more and more short texts (e.g., ads, tweets) appear on the Web. Classifying short texts into a large taxonomy like ODP or Wikipedia category system has become an important mining task to improve the performance of many applications such as contextual advertising and topic detection for micro-blogging. In this paper, we propose a novel multi-stage classification approach to solve the problem. First, explicit semantic analysis is used to add more features for both short texts and categories. Second, we leverage information retrieval technologies to fetch the most relevant categories for an input short text from thousands of candidates. Finally, a SVM classifier is applied on only a few selected categories to return the final answer. Our experimental results show that the proposed method achieved significant improvements on classification accuracy compared with several existing state of art approaches. 0 0
Towards industrial implementation of emerging semantic technologies Breindel J.T.
Grosse I.R.
Krishnamurty S.
Altidor J.
Wileden J.
Trachtenberg S.
Witherell P.
Proceedings of the ASME Design Engineering Technical Conference English 2011 Every new design, project, or procedure within a company generates a considerable amount of new information and important knowledge. Furthermore, a tremendous amount of legacy knowledge already exists in companies in electronic and non-electronic formats, and techniques are needed for representing, structuring and reusing this knowledge. Many researchers have spent considerable time and effort developing semantic knowledge management systems, which in theory are presumed to address these problems. Despite significant research investments, little has been done to implement these systems within an industrial setting. In this paper we identify five main requirements to the development of an industry-ready application of semantic knowledge management systems and discuss how each of these can be addressed. These requirements include the ease of new knowledge management software adoption, the incorporation of legacy information, the ease of use of the user interface, the security of the stored information, and the robustness of the software to support multiple file types and allow for the sharing of information across platforms. Collaboration with Raytheon, a defense and aerospace systems company, allowed our team to develop and demonstrate a successful adoption of semantic abilities by a commercial company. Salient features of this work include a new tool, the e-Design MemoExtractor Software Tool, designed to mine and capture company information, a Raytheon-specific extension to the e-Design Framework, and a novel semantic environment in the form of a customized semantic wiki SMW+. The advantages of this approach are discussed in the context of the industrial case study with Raytheon. 0 0
Transforming IbnSina into an advanced multilingual interactive android robot Mavridis N.
Aldhaheri A.
Aldhaheri L.
Khanii M.
Aldarmaki N.
2011 IEEE GCC Conference and Exhibition, GCC 2011 English 2011 IbnSina is the world's first Arabic-language conversational android robot, and is also part of an interactive theatre with multiple possibilities for human teleparticipation. In this paper, we describe extensions carried out to IbnSina's software architecture in order to enrich its capabilities in multiple ways, so that it can become an exciting educational / persuasive robot in the future. The main axis for extension were: access to online (Wikipedia) and stored (Koran database) content for dialogue generation, basic multilingual capability exploration (English and Arabic, also utilizing Google Translate), basic read-aloud-text capability (through OCR), and systematization of motor control (with higher-level API for real-time lip syncing, eye blinking, natural looking random face movements, and interpolation between facial expressions including an affective state subsystem). With such capabilities, IbnSina becomes closer to an attractive robot that can find real-world application in malls, schools, as a receptionist etc. 0 0
User generated (web) content: Trash or treasure Alluvatti G.M.
Capiluppi A.
De Ruvo G.
Molfetta M.
IWPSE-EVOL'11 - Proceedings of the 12th International Workshop on Principles on Software Evolution English 2011 It has been claimed that the advent of user-generated content has reshaped the way people approached all sorts of content realization projects, being multimedia (YouTube, DeviantArt, etc.), knowledge (Wikipedia, blogs), to software in general, when based on a more general Open Source model. After many years of research and evidence, several studies have demonstrated that Open Source Software (OSS) portals often contain a large amount of software projects that simply do not evolve, often developed by relatively small communities, and that still struggle to attract a sustained number of contributors. In terms of such content, the "tragedy" appears to be that the user demand for content and the offer of experts contributing content are on curves with different slopes, with the demand growing more quickly. In this paper we argue that, even given the differences in the requested expertise, many projects reliant on user-contributed content and expertise undergo a similar evolution, along a logistic growth: a first slow growth rate is followed by a much faster evolution growth. When a project fails to attract more developers i.e. contributors, the evolution of project's content does not present the "explosive growth" phase, and it will eventually "burnout", and the project appears to be abandoned. Far from being a negative finding, even abandoned project's content provides a valuable resource that could be reused in the future within other projects. 0 1
Using similarity-based approaches for continuous ontology development Ramezani M. International Journal on Semantic Web and Information Systems English 2011 This paper presents novel algorithms for learning semantic relations from an existing ontology or concept hierarchy. The authors suggest recommendation of semantic relations for supporting continuous ontology development, i.e., the development of ontologies during their use in social semantic bookmarking, semantic wiki, or other Web 2.0 style semantic applications. This paper assists users in placing a newly added concept in a concept hierarchy. The proposed algorithms are evaluated using datasets from Wikipedia category hierarchy and provide recommendations. Copyright 0 0
Vandalism detection in Wikipedia: A high-performing, feature-rich model and its reduction through Lasso Sara Javanmardi
David W. McDonald
Lopes C.V.
WikiSym 2011 Conference Proceedings - 7th Annual International Symposium on Wikis and Open Collaboration English 2011 User generated content (UGC) constitutes a significant fraction of the Web. However, some wiiki-based sites, such as Wikipedia, are so popular that they have become a favorite target of spammers and other vandals. In such popular sites, human vigilance is not enough to combat vandalism, and tools that detect possible vandalism and poor-quality contributions become a necessity. The application of machine learning techniques holds promise for developing efficient online algorithms for better tools to assist users in vandalism detection. We describe an efficient and accurate classifier that performs vandalism detection in UGC sites. We show the results of our classifier in the PAN Wikipedia dataset. We explore the effectiveness of a combination of 66 individual features that produce an AUC of 0.9553 on a test dataset - the best result to our knowledge. Using Lasso optimization we then reduce our feature - rich model to a much smaller and more efficient model of 28 features that performs almost as well - the drop in AUC being only 0.005. We describe how this approach can be generalized to other user generated content systems and describe several applications of this classifier to help users identify potential vandalism. 0 0
VisualWikiCurator: A corporate wiki plugin Nicholas Kong
Ben H.
Gregorio Convertino
Chi E.H.
Conference on Human Factors in Computing Systems - Proceedings English 2011 Knowledge workers who maintain corporate wikis face high costs for organizing and updating content on wikis. This problem leads to low adoption rates and compromises the utility of such tools in organizations. We describe a system that seeks to reduce the interactions costs of updating and organizing wiki pages by combining human and machine intelligence. We then present preliminary results of an ongoing lab-based evaluation of the tool with knowledge workers. 0 0
Web article quality assessment in multi-dimensional space Jangwhan Han
Fu X.
Chen K.
Chao Wang
Lecture Notes in Computer Science English 2011 Nowadays user-generated content (UGC) such as Wikipedia, is emerging on the web at an explosive rate, but its data quality varies dramatically. How to effectively rate the article's quality is the focus of research and industry communities. Considering that each quality class demonstrates its specific characteristics on different quality dimensions, we propose to learn the web quality corpus by taking different quality dimensions into consideration. Each article is regarded as an aggregation of sections and each section's quality is modelled using Dynamic Bayesian Network(DBN) with reference to accuracy, completeness and consistency. Each quality class is represented by three dimension corpora, namely accuracy corpus, completeness corpus and consistency corpus. Finally we propose two schemes to compute quality ranking. Experiments show our approach performs well. 0 0
Web data management Cafarella M.J.
Halevy A.Y.
Proceedings of the ACM SIGMOD International Conference on Management of Data English 2011 Web Data Management (or WDM) refers to a body of work concerned with leveraging the large collections of structured data that can be extracted from the Web. Over the past few years, several research and commercial efforts have explored these collections of data with the goal of improving Web search and developing mechanisms for surfacing different kinds of search answers. This work has leveraged (1) collections of structured data such as HTML tables, lists and forms, (2) recent ontologies and knowledge bases created by crowd-sourcing, such as Wikipedia and its derivatives, DBPedia, YAGO and Freebase, and (3) the collection of text documents from the Web, from which facts could be extracted in a domain-independent fashion. The promise of this line of work is based on the observation that new kinds of results can be obtained by leveraging a huge collection of independently created fragments of data, and typically in ways that are wholly unrelated to the authors' original intent. For example, we might use many database schemas to compute a schema thesaurus. Or we might examine many spreadsheets of scientific data that reveal the aggregate practice of an entire scientific field. As such, WDM is tightly linked to Web-enabled collaboration, even (or especially) if the collaborators are unwitting ones. We will cover the key techniques, principles and insights obtained so far in the area of Web Data Management. 0 0
Web wisdom: An essay on how web 2.0 and semantic web can foster a global knowledge society Christopher Thomas
Sheth A.
Computers in Human Behavior English 2011 Admittedly this is a presumptuous title that should never be used when reporting on individual research advances. Wisdom is just not a scientific concept. In this case, though, we are reporting on recent developments on the web that lead us to believe that the web is on the way to providing a platform for not only information acquisition and business transactions but also for large scale knowledge development and decision support. It is likely that by now every web user has participated in some sort of social function or knowledge accumulating function on the web, many times without even being aware of it, simply by searching and browsing, other times deliberately by e.g. adding a piece of information to a Wikipedia article or by voting on a movie on In this paper we will give some examples of how Web Wisdom is already emerging, some ideas of how we can create platforms that foster Web Wisdom and a critical evaluation of types of problems that can be subjected to Web Wisdom. 0 0
Wiki as business application platform: The MES showcase Christoph Sauer WikiSym 2011 Conference Proceedings - 7th Annual International Symposium on Wikis and Open Collaboration English 2011 This presentation shows the business application suite mHub that implements the core components of a manufacturing execution system (MES) purely with a specially developed application wiki distribution. The novelty of the application wiki is its "wiki as business application platform" approach, that abstracts all necessary technologies to implement the solution within the edit page area. Other than application wikis targeted for end users, that merely serve as query interfaces to existing business applications, this application wiki enables developers to script every aspect of the application domain within the wiki itself. 0 0
Wiki-based conceptual modeling: An experience with the public administration Casagni C.
Di Francescomarino C.
Dragoni M.
Fiorentini L.
Franci L.
Gerosa M.
Chiara Ghidini
Rizzoli F.
Marco Rospocher
Rovella A.
Luciano Serafini
Sparaco S.
Tabarroni A.
Lecture Notes in Computer Science English 2011 The dematerialization of documents produced within the Public Administration (PA) represents a key contribution that Information and Communication Technology can provide towards the modernization of services within the PA. The availability of proper and precise models of the administrative procedures, and of the specific "entities" related to these procedures, such as the documents involved in the procedures or the organizational roles performing the activities, is an important step towards both (1) the replacement of paper-based procedures with electronic-based ones, and (2) the definition of guidelines and functions needed to safely store, catalogue, manage and retrieve in an appropriate archival system the electronic documents produced within the PA. In this paper we report the experience of customizing a semantic wiki based tool (MoKi ) for the modeling of administrative procedures (processes) and their related "entities" (ontologies). The tool has been used and evaluated by several domain experts from different Italian regions in the context of a national project. This experience, and the reported evaluation, highlight the potential and criticality of using semantic wiki-based tools for the modeling of complex domains composed of processes and ontologies in a real setting. 0 0
Wiki-watchdog: Anomaly detection in Wikipedia through a distributional lens Arackaparambil C.
Yan G.
Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 English 2011 Wikipedia has become a standard source of reference online, and many people (some unknowingly) now trust this corpus of knowledge as an authority to fulfil their information requirements. In doing so they task the human contributors of Wikipedia with maintaining the accuracy of articles, a job that these contributors have been performing admirably. We study the problem of monitoring the Wikipedia corpus with the goal of automated, online anomaly detection. We present Wiki-watchdog, an efficient distribution-based methodology that monitors distributions of revision activity for changes. We show that using our methods it is possible to detect the activity of bots, flash events, and outages, as they occur. Our methods are proposed to support the monitoring of the contributors. They are useful to speed-up anomaly detection, and identify events that are hard to detect manually. We show the efficacy and the low false-positive rate of our methods by experiments on the revision history of Wikipedia. Our results show that distribution-based anomaly detection has a higher detection rate than traditional methods based on either volume or entropy alone. Unlike previous work on anomaly detection in information networks that worked with a static network graph, our methods consider the network as it evolves and monitors properties of the network for changes. Although our methodology is developed and evaluated on Wikipedia, we believe it is an effective generic anomaly detection framework in its own right. 0 0
WikiTeams: How do they achieve success? Piotr Turek
Adam Wierzbicki
Radoslaw Nielek
Anwitaman Datta
IEEE Potentials English 2011 Web 2.0 technology and so-called social media are among the most popular (among users and researchers alike) Internet technologies today. Among them, Wiki technology - created to simplify HTML editing and enable open, collaborative editing of pages by ordinary Web users - occupies an important place. Wiki is increasingly adopted by businesses as a useful form of knowledge management and sharing, creating "corporate Wikis." However, the most widely known application of Wiki technology - Wikipedia - is, according to many analysts, more than just an open encyclopedia that uses Wiki. 0 0
Wikigramming: A wiki-based training environment for programming Takashi Hattori Proceedings - International Conference on Software Engineering English 2011 Wiki is one of the most successful technologies in Web 2.0 because it is so simple that anyone can start using it instantly. The main aim of this research is to realize a collaborative programming environment that is as simple as Wiki. Each Wiki page contains source code of a Scheme function which is executed on the server. Users can edit any function at any time without complicated procedure, and see the results of their changes instantly. In order to avoid intentional or unintentional destruction of working programs, when users attempt to modify existing functions, the modified version must pass unit tests written by other users. Though changes are made anonymously, we can have some confidence if test cases are written by many users. 0 0
Wikiotics: The interactive language instruction Wiki Ian Sullivan
Garrison J.R.
Matthew Curinga
WikiSym 2011 Conference Proceedings - 7th Annual International Symposium on Wikis and Open Collaboration English 2011 While most existing wiki systems are geared toward editing text documents, we have built Wikiotics to enable the collaborative creation of interactive multimedia materials most needed in language instruction. In our demonstration, we will show several types of interactive lessons that can be created from simple multimedia elements. We will also show the lesson creation/editing interfaces and how our smart phone app can simplify the process of capturing local media and integrating that new media into existing lessons. 0 0
Wikipedia sets: Context-oriented related entity acquisition from multiple words Masumi Shirakawa
Kotaro Nakayama
Takahiro Hara
Shojiro Nishio
Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 English 2011 In this paper, we propose a method which acquires related words (entities) from multiple words by naturally disambiguating their meaning and considering their contexts. In addition, we introduce a bootstrapping method for improving the coverage of association relations. Experimental result shows that our method can acquire related words depending on the contexts of multiple words compared to the ESA-based method. 0 0
Wikis supporting research workshops in higher education Rodriguez-Hidalgo R.C.
Torres-Alfonso A.M.
Zhu C.
Questier F.
Proceedings of the 6th Iberian Conference on Information Systems and Technologies, CISTI 2011 English 2011 This paper reports the results of a pilot study conducted on a Cuban Higher Education setting. A classroom of twenty students of the Sciences of Information career at Central University "Marta Abreu" of Las Villas (UCLV1) was inquired during the use of a wiki tool supporting a research workshop in the course of Databases Theory (DBT). The purpose of this study is to test the following hypotheses: (1) the collaboration supported by social software reinforces the peer relationships among the students of the class and (2) improves the time efficiency of the students and instructors (stakeholders) participating in these collaborative activities. A survey and several interviews were conducted to gather data about the social network the students formed for studying DBT, and about the time they spent on that. The results of these instruments were contrasted with the results of an observation conducted during the collaborative activities. The data of the students' achievements and social network state using the wiki tool were compared to similar data from other two precedent, non-wiki-supported research workshops. The use of the wiki tool was found effective to reinforce the peer learning relationships, and consequently, to improve their achievements on the subject. Finally, the time spent for accomplishing the collaborative learning activities did not decrease significantly during the use of the social software. 0 0 - Weaving Chinese linking open data Niu X.
Xiaohua Sun
Haofen Wang
Rong S.
Qi G.
Yiqin Yu
Lecture Notes in Computer Science English 2011 Linking Open Data (LOD) has become one of the most important community efforts to publish high-quality interconnected semantic data. Such data has been widely used in many applications to provide intelligent services like entity search, personalized recommendation and so on. While DBpedia, one of the LOD core data sources, contains resources described in multilingual versions and semantic data in English is proliferating, there is very few work on publishing Chinese semantic data. In this paper, we present, the first effort to publish large scale Chinese semantic data and link them together as a Chinese LOD (CLOD). More precisely, we identify important structural features in three largest Chinese encyclopedia sites (i.e., Baidu Baike, Hudong Baike, and Chinese Wikipedia) for extraction and propose several data-level mapping strategies for automatic link discovery. As a result, the CLOD has more than 5 million distinct entities and we simply link CLOD with the existing LOD based on the multilingual characteristic of Wikipedia. Finally, we also introduce three Web access entries namely SPARQL endpoint, lookup interface and detailed data view, which conform to the principles of publishing data sources to LOD. 0 0
"Got You!": Automatic vandalism detection in wikipedia with web-based shallow syntactic-semantic modeling Wang W.Y.
McKeown K.R.
Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference English 2010 Discriminating vandalism edits from non-vandalism edits in Wikipedia is a challenging task, as ill-intentioned edits can include a variety of content and be expressed in many different forms and styles. Previous studies are limited to rule-based methods and learning based on lexical features, lacking in linguistic analysis. In this paper, we propose a novel Web-based shallow syntacticsemantic modeling method, which utilizes Web search results as resource and trains topic-specific n-tag and syntactic n-gram language models to detect vandalism. By combining basic task-specific and lexical features, we have achieved high F-measures using logistic boosting and logistic model trees classifiers, surpassing the results reported by major Wikipedia vandalism detection systems. 0 0
A Requirements Maturity Measurement Approach based on SKLSEWiki Peng R.
Ye Q.
Ye M.
Proceedings - International Computer Software and Applications Conference English 2010 With the development of IT, the scale and complexity of information system has been dramatically increased. Followed is that the related stakeholders' size increases sharply. How to promote the requirements negotiation of large scale stakeholders becomes a focus of attention. Wiki, as a lightweight documentation and distributed collaboration platform, has demonstrated its capability in distributed requirements elicitation and documentation. Most efforts are paid to construct friendly user interface and collaborative editing capabilities. In this paper, a new concept, requirement maturity, is proposed to represent the stable degree of requirement reached through the negotiation process. A Requirement Maturity Measurement Approach based on Wiki uses the requirement maturity as a threshold to select requirements. Thus, the requirements, which reach a stable status through full negotiation, can be found out. A platform SKLSEWiki is developed to validate the approach. 0 0
A framework of collaborative adaptation authoring Nurjanah D.
Davis H.C.
Tiropanis T.
Proceedings of the 6th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2010 English 2010 Adaptive Educational Hypermedia systems (AEH) enhance learning by adaptation and personalisation. As a consequence, wide ranging knowledge and learning content are needed. Problems then emerge in the provision of suitable authoring tools to carry out the authoring process which is complex and time consuming. Based on the fact that former research studies on authoring have identified drawbacks in collaboration, usabilility, efficiency, or interoperability, this paper proposes an approach for collaborative adaptation authoring for adaptive learning. The intended approach aims at improving authoring for AEH systems by allowing many people to participate and enhancing authors' interaction. The novelty of the approach lies in how the domain knowledge which has been semantically defined is enriched, and in the application of Computer Support Collaborative Work (CSCW). This approach adopts the advantages of existing semantic web technology and wiki-based authoring tools used to develop domain knowledge; its output is then enriched with pedagogy-related knowledge including adaptation. The output of the system is intended to be delivered in an existing AEH system. 0 0
Approaches for automatically enriching wikipedia Zareen Syed
Tim Finin
AAAI Workshop - Technical Report English 2010 We have been exploring the use of Web-derived knowledge bases through the development of Wikitology - a hybrid knowledge base of structured and unstructured information extracted from Wikipedia augmented by RDF data from DBpedia and other Linked Open Data resources. In this paper, we describe approaches that aid in enriching Wikipedia and thus the resources that derive from Wikipedia such as the Wikitology knowledge base, DBpedia, Freebase and Powerset. Copyright © 2010, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
Automatic generation of semantic fields for annotating web images Gang Wang
Chua T.S.
Ngo C.-W.
Wang Y.C.
Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference English 2010 The overwhelming amounts of multimedia contents have triggered the need for automatically detecting the semantic concepts within the media contents. With the development of photo sharing websites such as Flickr, we are able to obtain millions of images with usersupplied tags. However, user tags tend to be noisy, ambiguous and incomplete. In order to improve the quality of tags to annotate web images, we propose an approach to build Semantic Fields for annotating the web images. The main idea is that the images are more likely to be relevant to a given concept, if several tags to the image belong to the same Semantic Field as the target concept. Semantic Fields are determined by a set of highly semantically associated terms with high tag co-occurrences in the image corpus and in different corpora and lexica such as WordNet and Wikipedia. We conduct experiments on the NUSWIDE web image corpus and demonstrate superior performance on image annotation as compared to the state-ofthe- art approaches. 0 0
Co-star: A co-training style algorithm for hyponymy relation acquisition from structured and unstructured text Oh J.-H.
Yamada I.
Kentaro Torisawa
Saeger S.D.
Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference English 2010 This paper proposes a co-training style algorithm called Co-STAR that acquires hyponymy relations simultaneously from structured and unstructured text. In Co- STAR, two independent processes for hyponymy relation acquisition - one handling structured text and the other handling unstructured text - collaborate by repeatedly exchanging the knowledge they acquired about hyponymy relations. Unlike conventional co-training, the two processes in Co-STAR are applied to different source texts and training data. We show the effectiveness of this algorithm through experiments on large scale hyponymy-relation acquisition from Japanese Wikipedia and Web texts. We also show that Co-STAR is robust against noisy training data. 0 0
Extracting the gist of Social Network Services using Wikipedia Akiyo Nadamoto
Eiji Aramaki
Takeshi Abekawa
Yohei Murakami
IiWAS2010 - 12th International Conference on Information Integration and Web-Based Applications and Services English 2010 Social Network Services(SNSs), which are maintained by a community of people, are among the popular Web 2.0 tools. Multiple users freely post their comments to an SNS thread. It is difficult to understand the gist of the comments because the dialog in an SNS thread is complicated. In this paper, we propose a system that presents the gist of information at a glance and basic information about an SNS thread by using Wikipedia. We focus on the table of contents (TOC) of the relevant articles on Wikipedia. Our system compares the comments in a thread with the information in the TOC and identifies contents that are similar. We consider the similar contents in the TOC as the gist of the thread and paragraphs in Wikipedia similar to the comments in the thread as comprising basic information about the thread. Thus, a user can obtain the gist of an SNS thread by viewing a table with similar contents. Copyright 2010 ACM. 0 0
Finding new information via robust entity detection Iacobelli F.
Nichols N.
Birnbaum L.
Hammond K.
AAAI Fall Symposium - Technical Report English 2010 Journalists and editors work under pressure to collect relevant details and background information about specific events. They spend a significant amount of time sifting through documents and finding new information such as facts, opinions or stakeholders (i.e. people, places and organizations that have a stake in the news). Spotting them is a tedious and cognitively intense process. One task, essential to this process, is to find and keep track of stakeholders. This task is taxing cognitively and in terms of memory. Tell Me More offers an automatic aid to this task. Tell Me More is a system that, given a seed story, mines the web for similar stories reported by different sources and selects only those stories which offer new information with respect to that original seed story. Much like a journalist, the task of detecting named entities is central to its success. In this paper we briefly describe Tell Me More and, in particular, we focus on Tell Me More's entity detection component. We describe an approach that combines off-the-shelf named entity recognizers (NERs) with WPED, an in-house publicly available NER that uses Wikipedia as its knowledge base. We show significant increase in precision scores with respect to traditional NERs. Lastly, we present an overall evaluation of Tell Me More using this approach. Copyright © 2010, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
Learning from the web: Extracting general world knowledge from noisy text Gordon J.
Van Durme B.
Schubert L.K.
AAAI Workshop - Technical Report English 2010 The quality and nature of knowledge that can be found by an automated knowledge-extraction system depends on its inputs. For systems that learn by reading text, the Web offers a breadth of topics and currency, but it also presents the problems of dealing with casual, unedited writing, non-textual inputs, and the mingling of languages. The results of extraction using the KNEXT system on two Web corpora - Wikipedia and a collection of weblog entries - indicate that, with automatic filtering of the output, even ungrammatical writing on arbitrary topics can yield an extensive knowledge base, which human judges find to be of good quality, with propositions receiving an average score across both corpora of 2.34 (where the range is 1 to 5 and lower is better) versus 3.00 for unfiltered output from the same sources. Copyright © 2010, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
LearningWeb query patterns for imitatingWikipedia articles Shohei Tanaka
Naokaki Okazaki
Mitsuru Ishizuka
Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference English 2010 This paper presents a novel method for acquiring a set of query patterns to retrieve documents containing important information about an entity. Given an existing Wikipedia category that contains the target entity, we extract and select a small set of query patterns by presuming that formulating search queries with these patterns optimizes the overall precision and coverage of the returned Web information. We model this optimization problem as a weighted maximum satisfiability (weighted Max-SAT) problem. The experimental results demonstrate that the proposed method outperforms other methods based on statistical measures such as frequency and point-wise mutual information (PMI), which are widely used in relation extraction. 0 0
Metadata repository management using the MediaWiki interoperability framework a case study: The keytonature project Veja C.F.M.
Gregor Hagedorn
Gisela Weber
Mircea Giurgiu
EChallenges e-2010 Conference English 2010 In the KeyToNature project a user-centred and collaborative approach for metadata repository management was developed. KeyToNature is an EU project to enhance the knowledge of biodiversity by improving the availability of digital and non-digital media along with digital tools for the identification of living organisms throughout Europe. To improve the ability to search and access information, metadata are provided and integrated into a metadata repository. This paper presents a method utilizing web-based MediaWiki system as part of a low-tech interoperability and repository layer for data providers, end users, developers, and project partners. Because the level of technological expertise of the data providers varies greatly, a solution accessible for non-expert data providers was developed. The main features of this method are the automatic metadata repository management, and an ontological approach with ingestion workflows integrated into MediaWiki collaborative framework. Extensive user testing shows performance advantages of the method and attests usefulness in the application area. This practice-oriented method can be adopted by other projects aiming at collaborative knowledge acquisition and automatic metadata repository management, regardless of domain of discourse. Copyright 0 0
Morpheus: A deep web question answering system Grant C.
George C.P.
Gumbs J.-D.
Wilson J.N.
Dobbins P.J.
IiWAS2010 - 12th International Conference on Information Integration and Web-Based Applications and Services English 2010 When users search the deep web, the essence of their search is often found in a previously answered query. The Morpheus question answering system reuses prior searches to answer similar user queries. Queries are represented in a semistructured format that contains query terms and referenced classes within a specific ontology. Morpheus answers questions by using methods from prior successful searches. The system ranks stored methods based on a similarity quasimetric defined on assigned classes of queries. Similarity depends on the class heterarchy in an ontology and its associated text corpora. Morpheus revisits the prior search pathways of the stored searches to construct possible answers. Realm-based ontologies are created using Wikipedia pages, associated categories, and the synset heterarchy of WordNet. This paper describes the entire process with emphasis on the matching of user queries to stored answering methods. Copyright 2010 ACM. 0 0
MuZeeker: Adapting a music search engine for mobile phones Larsen J.E.
Halling S.
Sigurosson M.
Hansen L.K.
Lecture Notes in Computer Science English 2010 We describe MuZeeker, a search engine with domain knowledge based on Wikipedia. MuZeeker enables the user to refine a search in multiple steps by means of category selection. In the present version we focus on multimedia search related to music and we present two prototype search applications (web-based and mobile) and discuss the issues involved in adapting the search engine for mobile phones. A category based filtering approach enables the user to refine a search through relevance feedback by category selection instead of typing additional text, which is hypothesized to be an advantage in the mobile MuZeeker application. We report from two usability experiments using the think aloud protocol, in which N=20 participants performed tasks using MuZeeker and a customized Google search engine. In both experiments web-based and mobile user interfaces were used. The experiment shows that participants are capable of solving tasks slightly better using MuZeeker, while the " inexperienced" MuZeeker users perform slightly slower than experienced Google users. This was found in both the web-based and the mobile applications. It was found that task performance in the mobile search applications (MuZeeker and Google) was 2-2.5 times lower than the corresponding web-based search applications (MuZeeker and Google). 0 0
Process makna - A semantic wiki for scientific workflows Paschke A.
Zhao Z.
CEUR Workshop Proceedings English 2010 Virtual e-Science infrastructures supporting Web-based scientific workflows are an example for knowledge-intensive collaborative and weakly-structured processes where the interaction with the human scientists during process execution plays a central role. In this paper we propose the lightweight dynamic user-friendly interaction with humans during execution of scientific workflows via the low-barrier approach of Semantic Wikis as an intuitive interface for non-technical scientists. Our Process Makna Semantic Wiki system is a novel combination of an business process management system adapted for scientific workflows with a Corporate Semantic Web Wiki user interface supporting knowledge intensive human interaction tasks during scientific workflow execution. 0 0
QMUL @ MediaEval 2010 Tagging Task: Semantic query expansion for predicting user tags Chandramouli K.
Kliegr T.
Piatrik T.
Izquierdo E.
MediaEval Benchmarking Initiative for Multimedia Evaluation - The "Multi" in Multimedia: Speech, Audio, Visual Content, Tags, Users, Context, MediaEval 2010 Working Notes Proceedings 2010 This paper describes our participation in "The Wild Wild Web Tagging Task @ MediaEval 2010", which aims to predict user tags based on features derived from video such as speech, audio, visual content or associated textual or social information. Two tasks were pursued: (i) closed-set annotations and (ii) open-set annotations. We have attempted to evaluate whether using only a limited number of features (video title, filename and description) can be compensated by semantic expansion with NLP tools and Wikipedia and Wordnet. This technique proved successful on the open-set task with approximately 20% generated tags being considered relevant by all manual annotators. On the closed-set task, the best result (MAP 0.3) was achieved on tokenized filenames combined with video descriptions, indicating that filenames are a valuable tag predictor. 0 0
Requirements for semantic web applications in engineering David Fowler
Crowder R.M.
Tao Guan
Shadbolt N.
Gary Wills
Proceedings of the ASME Design Engineering Technical Conference English 2010 In this paper we describe some applications of Semantic Web technologies for the engineering design community. Specifically, we use Semantic Wikis to form a central knowledge base, which other applications then refer to. The developed applications include an advisor for performing Computational Fluid Dynamics simulations, a Semantic search engine, and an assistant for airfoil design. In the conclusions we discuss lessons learned and subsequently requirements for future systems. Copyright 0 0
The effects of open innovation on collaboration & knowledge sharing in student design teams Koch M.D.
Schulte R.J.
Tumer I.Y.
Proceedings of the ASME Design Engineering Technical Conference English 2010 As the need to innovate more creatively and effectively becomes increasingly apparent in engineering design, powerful open design tools and practices have emerged that are allowing organizations and firms to tap an already vast pool of skills, knowledge and intellect to solve complex design problems. The need for engineering design educators to bring these new trends into the classroom continues to grow as the industry for which students are being prepared begins to revamp its design strategies and practices in the pursuit of more openly accessible information infrastructures. By conducting an experimental study of over 25 student design groups in an undergraduate design engineering class, our team was able to gauge the relevance and utility of collaboration and knowledge sharing between and within design groups. Specifically, issues and opportunities were identified to help bring engineering and design education in line with the increasingly networked and distributed professional engineering environment that students will be enter upon graduation. Copyright 0 0
UNIpedia: A unified ontological knowledge platform for semantic content tagging and search Kalender M.
Dang J.
Uskudarli S.
Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010 English 2010 The emergence of an ever increasing number of documents makes it more and more difficult to locate them when desired. An approach for improving search results is to make use of user-generated tags. This approach has led to improvements. However, they are limited because tags are (1) free from context and form, (2) user generated, (3) used for purposes other than description, and (4) often ambiguous. As a formal, declarative knowledge representation model, Ontologies provide a foundation upon which machine understandable knowledge can be obtained and tagged, and as a result, it makes semantic tagging and search possible. With an ontology, semantic web technologies can be utilized to automatically generate semantic tags. WordNet has been used for this purpose. However, this approach falls short in tagging documents that refer to new concepts and instances. To address this challenge, we present UNIpedia - a platform for unifying different ontological knowledge bases by reconciling their instances as WordNet concepts. Our mapping algorithms use rule based heuristics extracted from ontological and statistical features of concept and instances. UNIpedia is used to semantically tag contemporary documents. For this purpose, the Wikipedia and OpenCyc knowledge bases, which are known to contain up to date instances and reliable metadata about them, are selected. Experiments show that the accuracy of the mapping between WordNet and Wikipedia is 84% for the most relevant concept name and 90% for the appropriate sense. 0 0
CalSWIM: A wiki-based data sharing platform Yasser Ganjisaffar
Sara Javanmardi
Grant S.
Lopes C.V.
Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering English 2009 Organizations increasingly create massive internal digital data repositories and are looking for technical advances in managing, exchanging and integrating explicit knowledge. While most of the enabling technologies for knowledge management have been used around for several years, the ability to cost effective data sharing, integration and analysis into a cohesive infrastructure evaded organizations until the advent of Web 2.0 applications. In this paper, we discuss our investigations into using a Wiki as a web-based interactive knowledge management system, which is integrated with some features for easy data access, data integration and analysis. Using the enhanced wiki, it possible to make organizational knowledge sustainable, expandable, outreaching and continually up-to-date. The wiki is currently under use as California Sustainable Watershed Information Manager. We evaluate our work according to the requirements of knowledge management systems. The result shows that our solution satisfies more requirements compared to other tools. 0 0
Effective visualization and navigation in a multimedia document collection using ontology Mishra S.
Ghosh H.
Lecture Notes in Computer Science English 2009 We present a novel user interface for visualizing and navigating in a multimedia document collection. Domain ontology has been used to depict the background knowledge organization and map the multimedia information nodes on that knowledge map, thereby making the implicit knowledge organization in a collection explicit. The ontology is automatically created by analyzing the links in Wikipedia, and is delimited to tightly cover the information nodes in the collection. We present an abstraction of the knowledge map for creating a clear and concise view, which can be progressively 'zoomed in' or 'zoomed out' to navigate the knowledge space. We organize the graph based on mutual similarity scores between the nodes for aiding the cognitive process during navigation. 0 0
Enhancing Wikipedia editing with WAI-ARIA Caterina Senette
Buzzi M.C.
Marina Buzzi
Barbara Leporini
Lecture Notes in Computer Science English 2009 Nowadays Web 2.0 applications allow anyone to create, share and edit on-line content, but accessibility and usability issues still exist. For instance, Wikipedia presents many difficulties for blind users, especially when they want to write or edit articles. In a previous stage of our study we proposed and discussed how to apply the W3C ARIA suite to simplify the Wikipedia editing page when interacting via screen reader. In this paper we present the results of a user test involving totally blind end-users as they interacted with both the original and the modified Wikipedia editing pages. Specifically, the purpose of the test was to compare the editing and formatting process for original and ARIA-implemented Wikipedia user interfaces, and to evaluate the improvements. 0 0
Extremal dependencies and rank correlations in power law networks Yana Volkovich
Litvak N.
Zwart B.
Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering English 2009 We analyze dependencies in complex networks characterized by power laws (Web sample, Wikipedia sample and a preferential attachment graph) using statistical techniques from the extreme value theory and the theory of multivariate regular variation. To the best of our knowledge, this is the first attempt to apply this well developed methodology to comprehensive graph data. The new insights this yields are striking: the three above-mentioned data sets are shown to have a totally different dependence structure between graph parameters, such as in-degree and PageRank. Based on the proposed approach, we suggest a new measure for rank correlations. Unlike most known methods, this measure is especially sensitive to rank permutations for top-ranked nodes. Using the new correlation measure, we demonstrate that the PageRank ranking is not sensitive to moderate changes in the damping factor. 0 0
Human-competitive tagging using automatic keyphrase extraction Olena Medelyan
Eibe Frank
Witten I.H.
EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 English 2009 This paper connects two research areas: automatic tagging on the web and statistical keyphrase extraction. First, we analyze the quality of tags in a collaboratively created folksonomy using traditional evaluation techniques. Next, we demonstrate how documents can be tagged automatically with a state-of-the-art keyphrase extraction algorithm, and further improve performance in this new domain using a new algorithm, "Maui", that utilizes semantic information extracted from Wikipedia. Maui outperforms existing approaches and extracts tags that are competitive with those assigned by the best performing human taggers. 0 0
Hypernym discovery based on distributional similarity and hierarchical structures Yamada I.
Kentaro Torisawa
Jun'ichi Kazama
Kuroda K.
Murata M.
De Saeger S.
Bond F.
Sumida A.
EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 English 2009 This paper presents a new method of developing a large-scale hyponymy relation database by combining Wikipedia and other Web documents. We attach new words to the hyponymy database extracted from Wikipedia by using distributional similarity calculated from documents on the Web. For a given target word, our algorithm first finds k similar words from the Wikipedia database. Then, the hypernyms of these k similar words are assigned scores by considering the distributional similarities and hierarchical distances in the Wikipedia database. Finally, new hyponymy relations are output according to the scores. In this paper, we tested two distributional similarities. One is based on raw verbnoun dependencies (which we call "RVD"), and the other is based on a large-scale clustering of verb-noun dependencies (called "CVD"). Our method achieved an attachment accuracy of 91.0% for the top 10,000 relations, and an attachment accuracy of 74.5% for the top 100,000 relations when using CVD. This was a far better outcome compared to the other baseline approaches. Excluding the region that had very high scores, CVD was found to be more effective than RVD. We also confirmed that most relations extracted by our method cannot be extracted merely by applying the well-known lexico-syntactic patterns to Web documents. 0 0
Interactive visualization tools for exploring the semantic graph of large knowledge spaces Christian Hirsch
John Hosking
John Grundy
CEUR Workshop Proceedings English 2009 While the amount of available information on the Web is increasing rapidly, the problem of managing it becomes more difficult. We present two applications, Thinkbase and Thinkpedia, which aim to make Web content more accessible and usable by utilizing visualizations of the semantic graph as a means to navigate and explore large knowledge repositories. Both of our applications implement a similar concept: They extract semantically enriched contents from a large knowledge spaces (Freebase and Wikipedia respectively), create an interactive graph-based representation out of it, and combine them into one interface together with the original text based content. We describe the design and implementation of our applications, and provide a discussion based on an informal evaluation. 0 0
KiWi - A platform for semantic social software Sebastian Schaffert
Eder J.
Grunwald S.
Kurz T.
Radulescu M.
Sint R.
Stroka S.
CEUR Workshop Proceedings English 2009 Semantic Wikis have demonstrated the power of combining Wikis with Semantic Web technology. The KiWi system goes beyond Semantic Wikis by providing a flexible and adaptable platform for building different kinds of Social Semantic Software, powered by Semantic Web technology. This article describes the main functionalities and components of the KiWi system with respect to the user interface and to the system architecture. A particular focus is given to what we call "content versatility", i.e. the reuse of the same content in different kinds of social software applications. The article concludes with an overview of different applications we envision can be built on top of KiWi. 0 0
Web-scale distributional similarity and entity set expansion Pantel P.
Crestan E.
Borkovsky A.
Popescu A.-M.
Vyas V.
EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 English 2009 Computing the pairwise semantic similarity between all words on the Web is a computationally challenging task. Parallelization and optimizations are necessary. We propose a highly scalable implementation based on distributional similarity, implemented in the MapReduce framework and deployed over a 200 billion word crawl of the Web. The pairwise similarity between 500 million terms is computed in 50 hours using 200 quad-core nodes. We apply the learned similarity matrix to the task of automatic set expansion and present a large empirical study to quantify the effect on expansion performance of corpus size, corpus quality, seed composition and seed size. We make public an experimental testbed for set expansion analysis that includes a large collection of diverse entity sets extracted from Wikipedia. 0 0
WikiBABEL: A wiki-style platform for creation of parallel data Kumaran A.
Saravanan K.
Datha N.
Ashok B.
Dendi V.
ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf. English 2009 In this demo, we present a wiki-style platform - WikiBABEL - that enables easy collaborative creation of multilingual content in many non- English Wikipedias, by leveraging the relatively larger and more stable content in the English Wikipedia. The platform provides an intuitive user interface that maintains the user focus on the multilingual Wikipedia content creation, by engaging search tools for easy discoverability of related English source material, and a set of linguistic and collaborative tools to make the content translation simple. We present two different usage scenarios and discuss our experience in testing them with real users. Such integrated content creation platform in Wikipedia may yield as a by-product, parallel corpora that are critical for research in statistical machine translation systems in many languages of the world. 0 0
"Who is this" quiz dialogue system and users' evaluation Sawaki M.
Minami Y.
Ryuichiro Higashinaka
Kohji Dohsaka
Maeda E.
2008 IEEE Workshop on Spoken Language Technology, SLT 2008 - Proceedings English 2008 In order to design a dialogue system that users enjoy and want to be near for a long time, it is important to know the effect of the system's action on users. This paper describes "Who is this" quiz dialogue system and its users' evaluation. Its quiz-style information presentation has been found effective for educational tasks. In our ongoing effort to make it closer to a conversational partner, we implemented the system as a stuffed-toy (or CG equivalent). Quizzes are automatically generated from Wikipedia articles, rather than from hand-crafted sets of biographical facts. Network mining is utilized to prepare adaptive system responses. Experiments showed the effectiveness of person network and the relationship of user attribute and interest level. 0 0
2Lip: The step towards the web3D Jacek Jankowski
Kruk S.R.
Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08 English 2008 The World Wide Web allows users to create and publish a variety of resources, including multimedia ones. Most of the contemporary best practices for designing web interfaces, however, do not take into account the 3D techniques. In this paper we present a novel approach for designing interactive web applications-2-Layer Interface Paradigm (2LIP). The background layer of the 2LIP-type user interface is a 3D scene, which a user cannot directly interact with. The foreground layer is HTML content. Only taking an action on this content (e.g. pressing a hyperlink, scrolling a page) can affect the 3D scene. We introduce a reference implementation of 2LIP: Copernicus - The Virtual 3D Encyclopedia, which shows one of the potential paths of the evolution of Wikipedia towards Web 3.0. Based on the evaluation of Copernicus we prove that designing web interfaces according to 2LIP provides users a better browsing experience, without harming the interaction. 0 0
Collaborative end-user development on handheld devices Ahmadi N.
Repenning A.
Ioannidou A.
Proceedings - 2008 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2008 English 2008 Web 2.0 has enabled end users to collaborate through their own developed artifacts, moving on from text (e.g., Wikipedia, Blogs) to images (e.g., Flickr) and movies (e.g., YouTube), changing end-user's role from consumer to producer. But still there is no support for collaboration through interactive end-user developed artifacts, especially for emerging handheld devices, which are the next collaborative platform. Featuring fast always-on networks, Web browsers that are as powerful as their desktop counterparts, and innovative user interfaces, the newest generation of handheld devices can run highly interactive content as Web applications. We have created Ristretto Mobile, a Web-compliant framework for running end-user developed applications on handheld devices. The Webbased Ristretto Mobile includes compiler and runtime components to turn end-user applications into Web applications that can run on compatible handheld devices, including the Apple iPhone and Nokia N800. Our paper reports on the technological and cognitive challenges in creating interactive content that runs efficiently and is user accessible on handheld devices. 0 0
Exploring the knowledge in semi structured data sets with rich queries Umbrich J.
Sebastian Blohm
CEUR Workshop Proceedings English 2008 Semantics can be integrated in to search processing during both document analysis and querying stages. We describe a system that incorporates both, semantic annotations of Wikipedia articles into the search process and allows for rich annotation search, enabling users to formulate queries based on their knowledge about how entities relate to one another while simultaneously retaining the freedom of free text search where appropriate. The outcome of this work is an application consisting of semantic annotators, an extended search engine and an interactive user interface. 0 0
Implementation of a multilevel Wiki for cross-domain collaboration Ong K.L.
Thanh Nguyen
Irvine C.
3rd International Conference on Information Warfare and Security English 2008 The pace of modern warfare requires tools that support intensive, ongoing collaboration between participants. Wiki technology provides a hypertext content-based collaborative authoring and information sharing environment that includes the ability to create links to other web contents, relative stability, ease of use, and logging features for tracking contributions and modifications. Military environments impose a requirement to enforce national policies regarding authorized access to classified information while satisfying the intent of wikis to provide an open context for content sharing. The Global Information Grid (GIG) vision calls for a highly flexible multilevel environment. The Monterey Security Architecture (MYSEA) Test-bed provides a distributed high assurance multilevel networking environment where authenticated users securely access data and services at different classification levels. The MYSEA approach is to provide users with unmodified commercial-off-the-shelf office productivity tools while enforcing a multilevel security (MLS) policy with high assurance. The extensible Test-bed architecture is designed with strategically placed trusted components that comprise the distributed TCB, while untrusted commercial clients support the user interface. We have extended the collaboration capabilities of MYSEA through the creation of a multilevel wiki. This wiki permits users who access the system at a particular sensitivity level to read and post information to the wiki at that level. Users at higher sensitivity levels may read wiki content at lower security levels and may post information at the higher security level. The underlying MLS policy enforcement mechanisms prevent low users from accessing higher sensitivity information. The multilevel wiki was created by porting a publicly available wiki engine to run on the high assurance system hosting the MYSEA server. A systematic process was used to select a wiki for the MYSEA environment. TWiki was chosen. To simplify identification of errors that might arise in the porting process, a three-stage porting methodology was used. Functional and security tests were performed to ensure that the wiki engine operates properly while being constrained by the underlying policy enforcement mechanisms of the server. An objective in designing the test plans was to ensure adequate test coverage, while avoiding a combinatoric explosion of test cases. Repeatable regression testing procedures were also produced. A conflict between the application-level DAC policy of the wiki and that of the MYSEA server was identified and resolved. 0 0
Improving interaction with virtual globes through spatial thinking: Helping users ask "Why?" Schoming J.
Raubal M.
Marsh M.
Brent Hecht
Antonio Kruger
Michael Rohs
International Conference on Intelligent User Interfaces, Proceedings IUI English 2008 Virtual globes have progressed from little-known technology to broadly popular software in a mere few years. We investigated this phenomenon through a survey and discovered that, while virtual globes are en vogue, their use is restricted to a small set of tasks so simple that they do not involve any spatial thinking. Spatial thinking requires that users ask "what is where" and "why"; the most common virtual globe tasks only include the "what". Based on the results of this survey, we have developed a multi-touch virtual globe derived from an adapted virtual globe paradigm designed to widen the potential uses of the technology by helping its users to inquire about both the "what is where" and "why" of spatial distribution. We do not seek to provide users with full GIS (geographic information system) functionality, but rather we aim to facilitate the asking and answering of simple "why" questions about general topics that appeal to a wide virtual globe user base. Copyright 2008 ACM. 0 0
NAGA: Harvesting, searching and ranking knowledge Gjergji Kasneci
Suchanek F.M.
Ifrim G.
Elbassuoni S.
Maya Ramanath
Gerhard Weikum
Proceedings of the ACM SIGMOD International Conference on Management of Data English 2008 The presence of encyclopedic Web sources, such as Wikipedia, the Internet Movie Database (IMDB), World Factbook, etc. calls for new querying techniques that are simple and yet more expressive than those provided by standard keyword-based search engines. Searching for explicit knowledge needs to consider inherent semantic structures involving entities and relationships. In this demonstration proposal, we describe a semantic search system named NAGA. NAGA operates on a knowledge graph, which contains millions of entities and relationships derived from various encyclopedic Web sources, such as the ones above. NAGA's graph-based query language is geared towards expressing queries with additional semantic information. Its scoring model is based on the principles of generative language models, and formalizes several desiderata such as confidence, informativeness and compactness of answers. We propose a demonstration of NAGA which will allow users to browse the knowledge base through a user interface, enter queries in NAGA's query language and tune the ranking parameters to test various ranking aspects. 0 0
Natural interaction on tabletops Baraldi S.
Del Bimbo A.
Landucci L.
Multimedia Tools and Applications English 2008 We present two different Computer Vision based systems that enable multiple users to concurrently manipulate graphic objects presented over tabletop displays. The two solutions have different hardware layouts and use two different algorithms for gesture analysis and recognition. The first one is a media-handling application that can be used by co-located and remote users. The second is a knowledge-building application where users can manipulate the contents of a wiki as a visual concept map. The performance of both systems is evaluated and expounded. A conceptual framework is introduced, providing the fundamental guidelines for the design of natural interaction languages on tabletops. 0 0
On the credibility of wikipedia: An accessibility perspective Rui Lopes
Carrico L.
International Conference on Information and Knowledge Management, Proceedings English 2008 User interfaces play a critical role on the credibility of authoritative information sources on the Web. Citation and referencing mechanisms often provide the required support for the independent verifiability of facts and, consequently, influence the credibility of the conveyed information. Since the quality level of these references has to be verifiable by users without any barriers, user interfaces cannot pose problems on accessing information. This paper presents a study about the influence of accessibility of user interfaces on the credibility of Wikipedia articles. We have analysed the accessibility quality level of the articles and the external Web pages used as authoritative references. This study has shown that there is a discrepancy on the accessibility of referenced Web pages, which can compromise the overall credibility of Wikipedia. Based on these results, we have analysed the article referencing lifecycle (technologies and policies) and propose a set of improvements that can help increasing the accessibility of references within Wikipedia articles. Copyright 2008 ACM. 0 0
Semantify Automatically turn your tags into senses Maurizio Tesconi
Francesco Ronzano
Andrea Marchetti
Salvatore Minutoli
CEUR Workshop Proceedings English 2008 At present tagging is experimenting a great diffusion as the most adopted way to collaboratively classify resources over the Web. In this paper, after a detailed analysis of the attempts made to improve the organization and structure of tagging systems as well as the usefulness of this kind of social data, we propose and evaluate the Tag Disambiguation Algorithm, mining data. It allows to easily semantify the tags of the users of a tagging service: it automatically finds out for each tag the related concept of Wikipedia in order to describe Web resources through senses. On the basis of a set of evaluation tests, we analyze all the advantages of our sense-based way of tagging, proposing new methods to keep the set of users tags more consistent or to classify the tagged resources on the basis of Wikipedia categories, YAGO classes or Wordnet synsets. We discuss also how our semanitified social tagging data are strongly linked to DBPedia and the datasets of the Linked Data community. 0 0
Social selected learning content out of web lectures Ketterl M.
Emden J.
Brunstein J.
HYPERTEXT'08: Proceedings of the 19th ACM Conference on Hypertext and Hypermedia, HT'08 with Creating'08 and WebScience'08 English 2008 Virtpresenter is a system for recording lectures and for re-using recorded contents in other didactic scenarios. Here we demonstrate how the interaction of earlier visitors in form of footprints can be used for extracting relevant passages in time based media. We illustrate how to extract online web lecture snippets for enriching static contents of a course wiki page or student blogs. 0 0
The ViskiMap toolkit: Extending mediawiki with topic maps Espiritu C.
Eleni Stroulia
Tirapat T.
Lecture Notes in Business Information Processing English 2008 In this paper, we present our ViskiMap systems, ENWiC (EduNuggets Wiki Crawler) and Annoki (Annotation wiki), for intelligent visualization of Wikis. In recent years, e-Learning has emerged as an appealing extension to traditional teaching. To some extent, the appeal of e-Learning derives from the great potential of information and knowledge sharing on the web, which has become a de-facto library to be used by students and instructors for educational purposes. Wiki's collaborative authoring nature makes it a very attractive tool to use for e-Learning purposes. Unfortunately, the web's text-based navigational structure becomes insufficient as the Wiki grows in size, and this backlash can hinder students from taking full advantage of the information available. The objective behind ViskiMap is to provide students with an intelligent interface for navigating Wikis and other similar large-scale websites. ViskiMap makes use of graphic organizers to visualize the relationships between content pages, so that students can easily get an understanding of the content elements and their relations, as they navigate through the Wiki pages. We describe ViskiMap's automated visualization process, and its user interfaces for students to view and navigate the Wiki in a meaningful manner, and for instructors to further enhance the visualization. We also discuss our usability study for evaluating the effectiveness of ENWiC as a Wiki Interface. 0 0
Thinkbase: A visual semantic Wiki Christian Hirsch
John Grundy
John Hosking
CEUR Workshop Proceedings English 2008 Thinkbase is a visual navigation and exploration tool for Freebase, an open, shared database of the world's knowledge. Thinkbase extracts the contents, including semantic relationships, from Freebase and visualizes them using an interactive visual representation. Providing a focus plus context view the visualization is displayed along with the Freebase article. Thinkbase provides a proof of concept of how visualizations can improve and support Semantic Web applications. The application is available via nz. 0 0
WikiBABEL: Community creation of multilingual data Kumaran A.
Saravanan K.
Maurice S.
WikiSym 2008 - The 4th International Symposium on Wikis, Proceedings English 2008 In this paper, we present a collaborative framework - wikiBABEL - for the efficient and effective creation of multilingual content by a community of users. The wikiBABEL framework leverages the availability of fairly stable content in a source language (typically, English) and a reasonable and not necessarily perfect machine translation system between the source language and a given target language, to create the rough initial content in the target language that is published in a collaborative platform. The platform provides an intuitive user interface and a set of linguistic tools for collaborative correction of the rough content by a community of users, aiding creation of clean content in the target language. We describe the architectural components implementing the wikiBABEL framework, namely, the systems for source and target language content management, mechanisms for coordination and collaboration and intuitive user interface for multilingual editing and review. Importantly, we discuss the integrated linguistic resources and tools, such as, bilingual dictionaries, machine translation and transliteration systems, etc., to help the users during the content correction and creation process. In addition, we analyze and present the prime factors - user-interface features or linguistic tools and resources - that significantly influence the user experiences in multilingual content creation. In addition to the creation of multilingual content, another significant motivation for the wikiBABEL framework is the creation of parallel corpora as a by-product. Parallel linguistic corpora are very valuable resources for both Statistical Machine Translation (SMT) and Crosslingual Information Retrieval (CLIR) research, and may be mined effectively from multilingual data with significant content overlap, as may be created in the wikiBABEL framework. Creation of parallel corpora by professional translators is very expensive, and hence the SMT and CLIR research have been largely confined to a handful of languages. Our attempt to engage the large and diverse Internet user population may aid creation of such linguistic resources economically, and may make computational linguistics research possible and practical in many languages of the world. 0 0
ZLinks: Semantic framework for invoking contextual linked data Bergman M.K.
Giasson F.
CEUR Workshop Proceedings English 2008 This first-ever demonstration of the new zLinks plug-in shows how any existing Web document link can be automatically transformed into a portal to relevant Linked Data. Each existing link disambiguates to its contextual and relevant subject concept (SC) or named entity (NE). The SCs are grounded in the OpenCyc knowledge base, supplemented by aliases and WordNet synsets to aid disambiguation. The NEs are drawn from Wikipedia as processed via YAGO, and other online fact-based repositories. The UMBEL ontology basis to this framework offers significant further advantages. The zLinks popup is invoked only as desired via unobtrusive user interface cues. 0 0
An integrated web environment for fast access and easy management of a synchrotron beam line Qian K.
Stojanoff V.
Nuclear Instruments and Methods in Physics Research, Section A: Accelerators, Spectrometers, Detectors and Associated Equipment English 2007 Tired of all the time spent on the phone or sending emails to schedule beam time? Why not make your own schedule when it is convenient to you? The integrated web environment at the NIGMS East Coast Structural Biology Research Facility allows users to schedule their own beam time as if they were making travel arrangements and provides staff with a set of toolkits for management of routine tasks. These unique features are accessible through the MediaWiki-powered home pages. Here we describe the main features of this web environment that have shown to allow for an efficient and effective interaction between the users and the facility. © 2007 Elsevier B.V. All rights reserved. 0 0
Can I go out and play now? Hawkins C. Electronic Device Failure Analysis English 2007 The use of online information sources, such as Wikipedia, for getting information about any field to expand knowledge, is discussed. Wikipedia is a nonprofit and free encyclopedia, which was started in 2001. It has more than two million English language encyclopedic articles on its websites. It is an open source media that allows users to write an article or edit it successfully. It has been designed for web user interface and contributions. It shows instructions for writing and editing an article on websites. It provides hyperlinks that guide readers for accessing specific information. EDFAS can also use it as a source material website with material written and reviewed by FA experts for disseminating information about its products. 0 0
Construction of a knowledge management framework based on Web 2.0 Liyong W.
Chengling Z.
2007 International Conference on Wireless Communications, Networking and Mobile Computing, WiCOM 2007 English 2007 ICT have a profound impact on the mode of organizational learning and that it offers a number of advantages and opportunities. But it also brings about a lot of potential problems in the field of knowledge management. Web2.0 is a term coined by Tim O'Reilly. It redefines the interactions between Internet and users and brings about a new Internet ecosystem. In this paper, we firstly introduced the main components and technologies of Web 2.0, then we proposed a framework that can corporate the Web 2.0 technologies into the filed of KM. Besides, we also proposed a knowledge management service strategy based on Web 2.0. With the help of the framework and the strategy, the potential problems will be solved to a great extent. 0 0
ESTER: Efficient search on text, entities, and relations Holger Bast
Chitea A.
Fabian Suchanek
Ingmar Weber
Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 English 2007 We present ESTER, a modular and highly efficient system for combined full-text and ontology search. ESTER builds on a query engine that supports two basic operations: prefix search and join. Both of these can be implemented very efficiently with a compact index, yet in combination provide powerful querying capabilities. We show how ESTER can answer basic SPARQL graph-pattern queries on the ontology by reducing them to a small number of these two basic operations. ESTER further supports a natural blend of such semantic queries with ordinary full-text queries. Moreover, the prefix search operation allows for a fully interactive and proactive user interface, which after every keystroke suggests to the user possible semantic interpretations of his or her query, and speculatively executes the most likely of these interpretations. As a proof of concept, we applied ESTER to the English Wikipedia, which contains about 3 million documents, combined with the recent YAGO ontology, which contains about 2.5 million facts. For a variety of complex queries, ESTER achieves worst-case query processing times of a fraction of a second, on a single machine, with an index size of about 4 GB. Copyright 2007 ACM. 0 0
Exploiting web 2.0 forallknowledge-based information retrieval Milne D.N. International Conference on Information and Knowledge Management, Proceedings English 2007 This paper describes ongoing research into obtaining and using knowledge bases to assist information retrieval. These structures are prohibitively expensive to obtain manually, yet automatic approaches have been researched for decades with limited success. This research investigates a potential shortcut: a way to provide knowledge bases automatically, without expecting computers to replace expert human indexers. Instead we aim to replace the professionals with thousands or even millions of amateurs: with the growing community of contributors who form the core of Web 2.0. Specifically we focus on Wikipedia, which represents a rich tapestry of topics and semantics and a huge investment of human effort and judgment. We show how this can be directly exploited to provide manually-defined yet inexpensive knowledge-bases that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We are also concerned with how best to make these structures available to users, and aim to produce a complete knowledge-based retrieval system-both the knowledge base and the tools to apply it-that can be evaluated by how well it assists real users in performing realistic and practical information retrieval tasks. To this end we have developed Koru, a new search engine that offers concrete evidence of the effectiveness of our Web 2.0 based techniques for assisting information retrieval. 0 0
Mass spectrometry and Web 2.0 Murray K.K. Journal of Mass Spectrometry English 2007 The term Web 2.0 is a convenient shorthand for a new era in the Internet in which users themselves are both generating and modifying existing web content. Several types of tools can be used. With social bookmarking, users assign a keyword to a web resource and the collection of the keyword 'tags' from multiple users form the classification of these resources. Blogs are a form of diary or news report published on the web in reverse chronological order and are a popular form of information sharing. A wiki is a website that can be edited using a web browser and can be used for collaborative creation of information on the site. This article is a tutorial that describes how these new ways of creating, modifying, and sharing information on the Web are being used for on-line mass spectrometry resources. Copyright 0 0
Micro-blog: Map-casting from mobile phones to virtual sensor maps Gaonkar S.
Choudhury R.R.
SenSys'07 - Proceedings of the 5th ACM Conference on Embedded Networked Sensor Systems English 2007 The synergy of phone sensors (microphone, camera, GPS, etc.), wireless capability, and ever-increasing device density can lead to novel people-centric applications. Unlike traditional sensor networks, the next generation networks may be participatory, interactive, and in the scale of human users. Millions of global data points can be organized on a visual platform, queried, and sophistically answered through human participation. Recent years have witnessed the isolated impacts of distributed knowledge sharing (Wikipedia), social networks, sensor networks, and mobile communication. We believe that significant more impact is latent in their convergence, that can to be drawn out through innovations in applications. This demonstration, called Micro-Blog, is a first step towards this goal. 0 0
NLPX at INEX 2006 Woodley A.
Shlomo Geva
Lecture Notes in Computer Science English 2007 XML information retrieval (XML-IR) systems aim to better fulfil users' information needs than traditional IR systems by returning results lower than the document level. In order to use XML-IR systems users must encapsulate their structural and content information needs in a structured query. Historically, these structured queries have been formatted using formal languages such as NEXI. Unfortunately, formal query languages are very complex and too difficult to be used by experienced - let alone casual - users and are too closely bound to the underlying physical structure of the collection. INEX's NLP task investigates the potential of using natural language to specify structured queries. QUT has participated in the NLP task with our system NLPX since its inception. Here, we discuss the changes we've made to NLPX since last year, including our efforts to port NLPX to Wikipedia. Second, we present the results from the 2006 INEX track where NLPX was the best performing participant in the Thorough and Focused tasks. 0 0
Relation extraction from Wikipedia using subtree mining Nguyen D.P.T.
Yutaka Matsuo
Mitsuru Ishizuka
Proceedings of the National Conference on Artificial Intelligence English 2007 The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The first challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting relations among entities from Wikipedia's English articles, which in turn can serve for intelligent systems to satisfy users' information needs. Our proposed method first anchors the appearance of entities in Wikipedia articles using some heuristic rules that are supported by their encyclopedic style. Therefore, it uses neither the Named Entity Recognizer (NER) nor the Coreference Resolution tool, which are sources of errors for relation extraction. It then classifies the relationships among entity pairs using SVM with features extracted from the web structure and subtrees mined from the syntactic structure of text. The innovations behind our work are the following: a) our method makes use of Wikipedia characteristics for entity allocation and entity classification, which are essential for relation extraction; b) our algorithm extracts a core tree, which accurately reflects a relationship between a given entity pair, and subsequently identifies key features with respect to the relationship from the core tree. We demonstrate the effectiveness of our approach through evaluation of manually annotated data from actual Wikipedia articles. Copyright © 2007, Association for the Advancement of Artificial Intelligence ( All rights reserved. 0 0
Social rewarding in wiki systems - motivating the community Bernhard Hoisl
Wolfgang Aigner
Silvia Miksch
Lecture Notes in Computer Science English 2007 Online communities have something in common: their success rise and fall with the participation rate of active users. In this paper we focus on social rewarding mechanisms that generate benefits for users in order to achieve a higher contribution rate in a wiki system. In an online community, social rewarding is in the majority of cases based on accentuation of the most active members. As money cannot be used as a motivating factor others like status, power, acceptance, and glory have to be employed. We explain different social rewarding mechanisms which aim to meet these needs of users. Furthermore, we implemented a number of methods within the MediaWiki system, where social rewarding criteria are satisfied by generating a ranking of most active members. 0 0
Temporal analysis of the wikigraph Buriol L.S.
Carlos Castillo
Debora Donato
Leonardi S.
Millozzi S.
Proceedings - 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings), WI'06 English 2007 Wikipedia is an online encyclopedia, available in more than 100 languages and comprising over .1 million articles in its English version. If we consider each Wlkipedia article as a node and each hyperlink between articles as an arc we have a "Wikigraph", a graph that represents the link structure of Wlkipedia. The Wikigraph differs from other Web graphs studied in the literature by the fact that there are explicit timestamps associated with each node's events. This allows us to do a detailed analysis of the Wlkipedia evolution over time. In the first part of this study we characterize this evolution in terms of users, editions and articles; in the second part, we depict the temporal evolution of several topological properties of the Wikigraph. The insights obtained from the Wikigraphs can be applied to large Web graphs from which the temporal data is usually not available. 0 4
WikiCreole: A common wiki markup Christoph Sauer
Chuck Smith
Tomas Benz
Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA English 2007 In this paper, we describe the wiki markup language WikiCreole, how it was developed, and related work. Creole does not replace existing markup, but instead enables wiki users to transfer content seamlessly across wikis, and for novice users to contribute more easily. In proposing a subset of markup elements that is as non-controversial as possible, this markup has evolved from existing wiki markup, hence the name Creole: a stable language that originated from a non-trivial combination of two or more languages. 0 0
A wiki as an extensible RDF presentation engine Rauschmayer A.
Kammergruber W.C.
CEUR Workshop Proceedings English 2006 Semantic wikis [1] establish the role of wikis as integrators of structured and semi-structured data. In this paper, we present Wikked, which is a semantic wiki turned inside out: it is a wiki engine that is embedded in the generic RDF editor Hyena. That is, Hyena edits (structured) RDF and leaves it to Wikked to display (semi-structured) wiki pages stored in RDF nodes. Wiki text has a clearly defined core syntax, while traditional wiki syntax is regarded as syntactic sugar. It is thus easy to convert Wikked pages to various output formats such as HTML and LaTeX. Wikked's built-in functions for presenting RDF data and for invoking Hyena functionality endow it with the ability to define simple custom user interfaces to RDF data. 0 0
Multichat: Persistent, text-as-you-type messaging in a web browser for fluid multi-person interaction and collaboration Schull J.
Axelrod M.
Quinsland L.
Proceedings of the Annual Hawaii International Conference on System Sciences English 2006 To facilitate face to face conversation between deaf and hearing participants, we created a cross-platform, browser-based, persistent text-as-you-type system that aggregates each individual's utterances in revisable personal notes on a user-configurable multi-person workspace in a common web browser. The system increases the fluidity of real time interaction, makes it easier to keep track of an individual's contributions over time, and allows users to format their contributions and customize their displays. Because the system affords "full duplex" parallel communication in a web browser, it supports new patterns of interaction and new possibilities for dynamic browser-based visual representations of temporal patterns of communication. Furthermore, the near-real-time-architectures we are exploring seem to open the way to a family of centralized and peer-to-peer persistent conversation applications such as real-time wikis, collaborative web design systems, collaborative web-surfing applications, and an in-class discussion systems. 0 0
Proceedings of WikiSym'06 - 2006 International Symposium on Wikis No author name available Proceedings of WikiSym'06 - 2006 International Symposium on Wikis English 2006 The proceedings contain 26 papers. The topics discussed include: how and why wikipedia works; how and why wikipedia works: an interview with Angela Beesley, Elisabeth Bauer, and Kizu Naoko; intimate information: organic hypertext structure and incremental; the augmented wiki; wiki uses in teaching and learning; the future of wikis; translation the wiki way; the radeox wiki render engine; is there a space for the teacher in a WIKI?; wikitrails: augmenting wiki structure for collaborative, interdisciplinary learning; towards wikis as semantic hypermedia; constrained wiki: an oxymoron?; corporate wiki users: results of a survey; workshop on wikipedia research; wiki markup standard workshop; wiki-based knowledge engineering: second workshop on semantic wikis; semantic wikipedia;and ontowiki: community-driven ontology engineering and ontology usage based on wikis. 0 0
Ultra lightweight web applications: A single-page wiki employing a partial ajax solution Rees M. AusWeb 2006: 12th Australasian World Wide Web Conference English 2006 The overloaded term Web 2.0 web site usually connotes an interactive web application that offers features normally associated with free-standing applications running directly under the control of an operating system. Such an interactive web applications, also known as a rich internet application (RIA), run within web browsers and must download XHTML and client-side scripts to control user interactivity. Via a variety of technologies the web server must provide a storage mechanism to support the RIA and the presentation of dynamic data in the browser interface. Such storage may be of large volume and bring concomitant bandwidth, response and server storage problems. It is usually the case that the XHTML and client scripts are relatively small in size so the use of the browser in this context to be called a lightweight client. Certainly the dynamic construction of the RIA user interface on demand completely eliminates the download and install problem of free-standing applications and ensures the user always uses the latest version of the RIA. This paper explores the possibility of building an ultra lightweight RIA where a single web page combines the interactive user interface and the storage mechanism in a single file. The author discovered this approach being used in the TiddlyWiki RIA created by Jeremy Ruston who employed client-side JavaScript to provide all functionality. Here all the main features of a wiki are supported by a single web page. DotWikIE is a re-implementation by the author of an ultra lightweight wiki with significantly improved editing and employing XML for storage of the wiki contents. Apart from being a useful personal wiki application DotWikIE can be extended in a number of ways. An example of automated clipboard monitoring is presented and discussed. For its implementation DotWikIE uses the JavaScript and XML parts of AJAX (Asynchronous JavaScript and XML). Using full AJAX requires the use of a web server. The paper contains a description of DotWikIEWeb that adds the asynchronous part of AJAX to extend coverage to a more usual web-based wiki while still retaining the simplicity of single, independent wiki web pages for deployment convenience. In the conclusion the paper discusses further extensions of the ultra lightweight RIA and other applications of this RIA implementation technique. © 2006. Michael Rees. 0 0
Understanding user perceptions on usefulness and usability of an integrated Wiki-G-Portal Theng Y.-L.
Yanyan Li
Lim E.-P.
Zhe Wang
Goh D.H.-L.
Chang C.-H.
Kalyani Chatterjea
Jinghua Zhang
Lecture Notes in Computer Science English 2006 This paper describes a pilot study on Wiki-G-Portal, a project integrating Wikipedia, an online encyclopedia, into G-Portal, a Web-based digital library, of geography resources. Initial findings from the pilot study seemed to suggest positive perceptions on usefulness and usability of Wiki-G-Portal, as well as subjects' attitude and intention to use. 0 0
Wiki means more: Hyperreading in Wikipedia Yuejiao Z. Proceedings of the Seventeenth ACM Conference on Hypertext and Hypermedia, HT'06 English 2006 Based on the open-sourcing technology of wiki, Wikipedia has initiated a new fashion of hyperreading. Reading Wikipedia creates an experience distinct from reading a traditional encyclopedia. In an attempt to disclose one of the site's major appeals to the Web users, this paper approaches the characteristics of hyperreading activities in Wikipedia from three perspectives. Discussions are made regarding reading path, user participation, and navigational apparatus in Wikipedia. Copyright 2006 ACM. 0 0
Wikifying your interface: Facilitating community-based interface translation Cameron Jones M.
Rathi D.
Twidale M.B.
Proceedings of the Conference on Designing Interactive Systems: Processes, Practices, Methods, and Techniques, DIS English 2006 We explore the application of a wiki-based technology and style of interaction to enabling the incremental translation of a collaborative application into a number of different languages, including variant English language interfaces better suited to the needs of particular user communities. The development work allows us to explore in more detail the design space of functionality and interfaces relating to tailoring, customization, personalization and localization, and the challenges of designing to support ongoing incremental contributions by members of different use communities. Copyright 2006 ACM. 0 1
RikWik: An extensible XML based Wiki Richard Mason
Paul Roe
Proceedings - 2005 International Symposium on Collaborative Technologies and Systems English 2005 Wiki Wiki Webs provide a simple form of collaboration between multiple users using standard web browsers. Their popularity stems from their easy access and straightforward editing manner. Wikis are often used to support collaboration in research. We present a Wiki Wiki Clone, RikWik, which uses XML to provide a simple, secure platform for collaboration. In addition to supporting standard Wiki features, RikWik also provides a robust security system, provides a Web Service front-end and allows custom page types to better suit specific collaborative endeavors. RikWik has a plug-in architecture to easily allow end users to extend the platform. Thus RikWik provides an extensible collaboration platform. It is in day to day use at QUT. 0 0