Metadata

From WikiPapers
Jump to: navigation, search

metadata is included as keyword or extra keyword in 0 datasets, 0 tools and 108 publications.

Datasets

There is no datasets for this keyword.

Tools

There is no tools for this keyword.


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Towards linking libraries and Wikipedia: Aautomatic subject indexing of library records with Wikipedia concepts Joorabchi A.
Mahdi A.E.
Journal of Information Science English 2014 In this article, we first argue the importance and timely need of linking libraries and Wikipedia for improving the quality of their services to information consumers, as such linkage will enrich the quality of Wikipedia articles and at the same time increase the visibility of library resources which are currently overlooked to a large degree. We then describe the development of an automatic system for subject indexing of library metadata records with Wikipedia concepts as an important step towards library-Wikipedia integration. The proposed system is based on first identifying all Wikipedia concepts occurring in the metadata elements of library records. This is then followed by training and deploying generic machine learning algorithms to automatically select those concepts which most accurately reflect the core subjects of the library materials whose records are being indexed. We have assessed the performance of the developed system using standard information retrieval measures of precision, recall and F-score on a dataset consisting of 100 library metadata records manually indexed with a total of 469 Wikipedia concepts. The evaluation results show that the developed system is capable of achieving an averaged F-score as high as 0.92. 0 0
A cloud of FAQ: A highly-precise FAQ retrieval system for the Web 2.0 Romero M.
Moreo A.
Castro J.L.
Knowledge-Based Systems English 2013 FAQ (Frequency Asked Questions) lists have attracted increasing attention for companies and organizations. There is thus a need for high-precision and fast methods able to manage large FAQ collections. In this context, we present a FAQ retrieval system as part of a FAQ exploiting project. Following the growing trend towards Web 2.0, we aim to provide users with mechanisms to navigate through the domain of knowledge and to facilitate both learning and searching, beyond classic FAQ retrieval algorithms. To this purpose, our system involves two different modules: an efficient and precise FAQ retrieval module and, a tag cloud generation module designed to help users to complete the comprehension of the retrieved information. Empirical results evidence the validity of our approach with respect to a number of state-of-the-art algorithms in terms of the most popular metrics in the field. © 2013 Elsevier B.V. All rights reserved. 0 0
A contextual semantic representation of learning assets in online communities of practice Berkani L.
Chikh A.
International Journal of Metadata, Semantics and Ontologies English 2013 This paper presents an ontology-based framework for a contextual semantic representation of learning assets within a Community of Practice of E-learning (CoPE). The community, made up of actors from the e-learning domain (teachers, tutors, pedagogues, administrators...), is considered as a virtual space for exchanging and sharing techno-pedagogic knowledge and know-how between those actors. Our objective is to semantically describe the CoPE's learning assets using contextual semantic annotations. We consider two types of semantic annotations: (a) objective annotations, describing the learning assets with a set of context-related metadata and (b) subjective annotations, to express the members' experience and feedback regarding these same assets. The paper is illustrated with a case study related to a semantic adaptive wiki using the framework and aiming to foster the knowledge sharing and reuse between the CoPE's members. The wiki provides essentially a semantic search and a recommendation support of assets. Copyright 0 0
Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms Joorabchi A.
Mahdi A.E.
Journal of Information Science English 2013 Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems. This article describes a machine learning-based keyphrase annotation method for scientific documents that utilizes Wikipedia as a thesaurus for candidate selection from documents' content. We have devised a set of 20 statistical, positional and semantical features for candidate phrases to capture and reflect various properties of those candidates that have the highest keyphraseness probability. We first introduce a simple unsupervised method for ranking and filtering the most probable keyphrases, and then evolve it into a novel supervised method using genetic algorithms. We have evaluated the performance of both methods on a third-party dataset of research papers. Reported experimental results show that the performance of our proposed methods, measured in terms of consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised and unsupervised methods. 0 0
Extending the possibilities for collaborative work with TEI/XML through the usage of a wiki system Entrup B.
Binder F.
Lobin H.
ACM International Conference Proceeding Series English 2013 This paper presents and discusses an integrated project-specific working environment for editing TEI/XML files and linking entities of interest to a dedicated wiki system. This working environment has been specifically tailored to the workflow in our interdisciplinary digital humanities project "GeoBib". It addresses some challenges that arose while working with person-related data and geographical references in a growing collection of TEI/XML files. While our current solution provides some essential benefits, we also discuss several critical issues and challenges that remain. 0 0
Extracting PROV provenance traces from Wikipedia history pages Missier P.
Zheng Chen
ACM International Conference Proceeding Series English 2013 Wikipedia History pages contain provenance metadata that describes the history of revisions of each Wikipedia article. We have developed a simple extractor which, starting from a user-specified article page, crawls through the graph of its associated history pages, and encodes the essential elements of those pages according to the PROV data model. The crawling is performed on the live pages using the Wikipedia REST interface. The resulting PROV provenance graphs are stored in a graph database (Neo4J), where they can be queried using the Cypher graph query language (proprietary to Neo4J), or traversed programmatically using the Neo4J Java Traversal API. 0 0
Metadata management of models for resources and environment based on Web 2.0 technology Lu Y.M.
Sheng L.
Wu S.
Yue T.X.
Communications in Computer and Information Science English 2013 The papers firstly introduce the standard framework of model metadata as well as its composition, content and meaning. It is held that model metadata should consist of an identifier, sphere of application, model parameters, principles, performances, run conditions, management information, references, and case information. Then we explained the virtual community for model metadata publishing, sharing and maintaining. Finally, we expatiated on the expression of model metadata standard based on XML schema Definition (XSD), the extended-metadata based on Tag and the publishing of model metadata based on Wiki. Based on Web 2.0 technology, the traditional model metadata which just create by modeler was extended to support extended-metadata created by model users or domain experts, which includes the feedback on model evaluation and suggestion. The user-metadata was refined from massive messy individual tags, which reflect the implicit knowledge of models. 0 0
The illiterate editor: Metadata-driven revert detection in wikipedia Segall J.
Greenstadt R.
Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 English 2013 As the community depends more heavily on Wikipedia as a source of reliable information, the ability to quickly detect and remove detrimental information becomes increasingly important. The longer incorrect or malicious information lingers in a source perceived as reputable, the more likely that information will be accepted as correct and the greater the loss to source reputation. We present The Illiterate Edi- Tor (IllEdit), a content-agnostic, metadata-driven classica- Tion approach to Wikipedia revert detection. Our primary contribution is in building a metadata-based feature set for detecting edit quality, which is then fed into a Support Vec- Tor Machine for edit classication. By analyzing edit histo- ries, the IllEdit system builds a prole of user behavior, es- Timates expertise and spheres of knowledge, and determines whether or not a given edit is likely to be eventually re- verted. The success of the system in revert detection (0.844 F-measure) as well as its disjoint feature set as compared to existing, content-analyzing vandalism detection systems, shows promise in the synergistic usage of IllEdit for increas- ing the reliability of community information. Copyright 2010 ACM. 0 0
Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms Joorabchi A.
Mahdi A.E.
Lecture Notes in Computer Science English 2012 Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents. However, scientific documents that are manually annotated with keyphrases are in the minority. This paper describes a machine learning-based automatic keyphrase annotation method for scientific documents, which utilizes Wikipedia as a thesaurus for candidate selection from documents' content and deploys genetic algorithms to learn a model for ranking and filtering the most probable keyphrases. Reported experimental results show that the performance of our method, evaluated in terms of inter-consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised methods. 0 0
Classifying image galleries into a taxonomy using metadata and wikipedia Kramer G.
Gosse Bouma
Hendriksen D.
Homminga M.
Lecture Notes in Computer Science English 2012 This paper presents a method for the hierarchical classification of image galleries into a taxonomy. The proposed method links textual gallery metadata to Wikipedia pages and categories. Entity extraction from metadata, entity ranking, and selection of categories is based on Wikipedia and does not require labeled training data. The resulting system performs well above a random baseline, and achieves a (micro-averaged) F-score of 0.59 on the 9 top categories of the taxonomy and 0.40 when using all 57 categories. 0 0
Engineering a controlled natural language into semantic MediaWiki Dantuluri P.
Davis B.
Ludwick P.
Handschuh S.
Lecture Notes in Computer Science English 2012 The Semantic Web is yet to gain mainstream recognition. In part this is caused by the relative complexity of the various semantic web formalisms, which act as a major barrier of entry to naive web users. In addition, in order for the Semantic Web to become a reality, we need semantic metadata. While controlled natural language research has sought to address these challenges, in the context of user friendly ontology authoring for domain experts, there has been little focus on how to adapt controlled languages for novice social web users. The paper describes an approach to using controlled languages for fact creation and management as opposed to ontology authoring, focusing on the domain of meeting minutes. For demonstration purposes, we developed a plug-in to the Semantic MediaWiki, which adds a controlled language editor extension. This editor aids the user while authoring or annotating in a controlled language in a user friendly manner. Controlled content is sent to a parsing service which generates semantic metadata from the sentences which are subsequently displayed and stored in the Semantic MediaWiki. The semantic metadata generated by the parser is grounded against a project documents ontology. The controlled language modeled covers a wide variety of sentences and topics used in the context of a meeting minute. Finally this paper provides a architectural overview of the annotation system. 0 0
How random walks can help tourism Lucchese C.
Perego R.
Silvestri F.
Vahabi H.
Venturini R.
Lecture Notes in Computer Science English 2012 On-line photo sharing services allow users to share their touristic experiences. Tourists can publish photos of interesting locations or monuments visited, and they can also share comments, annotations, and even the GPS traces of their visits. By analyzing such data, it is possible to turn colorful photos into metadata-rich trajectories through the points of interest present in a city. In this paper we propose a novel algorithm for the interactive generation of personalized recommendations of touristic places of interest based on the knowledge mined from photo albums and Wikipedia. The distinguishing features of our approach are multiple. First, the underlying recommendation model is built fully automatically in an unsupervised way and it can be easily extended with heterogeneous sources of information. Moreover, recommendations are personalized according to the places previously visited by the user. Finally, such personalized recommendations can be generated very efficiently even on-line from a mobile device. 0 0
Linking folksonomies to knowledge organization systems Jakob Voss Communications in Computer and Information Science English 2012 This paper demonstrates enrichment of set-model folksonomies with hierarchical links and mappings to other knowledge organization systems. The process is exemplified with social tagging practice in Wikipedia and in Stack Exchange. The extended folksonomies are created by crowdsourcing tag names and descriptions to translate them to linked data in SKOS. 0 0
Lookup tables: Fine-grained partitioning for distributed databases Tatarowicz A.L.
Curino C.
Jones E.P.C.
Madden S.
Proceedings - International Conference on Data Engineering English 2012 The standard way to get linear scaling in a distributed OLTP DBMS is to horizontally partition data across several nodes. Ideally, this partitioning will result in each query being executed at just one node, to avoid the overheads of distributed transactions and allow nodes to be added without increasing the amount of required coordination. For some applications, simple strategies, such as hashing on primary key, provide this property. Unfortunately, for many applications, including social networking and order-fulfillment, many-to-many relationships cause simple strategies to result in a large fraction of distributed queries. Instead, what is needed is a fine-grained partitioning, where related individual tuples (e.g., cliques of friends) are co-located together in the same partition. Maintaining such a fine-grained partitioning requires the database to store a large amount of metadata about which partition each tuple resides in. We call such metadata a lookup table, and present the design of a data distribution layer that efficiently stores these tables and maintains them in the presence of inserts, deletes, and updates. We show that such tables can provide scalability for several difficult to partition database workloads, including Wikipedia, Twitter, and TPC-E. Our implementation provides 40% to 300% better performance on these workloads than either simple range or hash partitioning and shows greater potential for further scale-out. 0 0
Multilingual named entity recognition using parallel data and metadata from wikipedia Soo-Hwan Kim
Toutanova K.
Yu H.
50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference English 2012 In this paper we propose a method to automatically label multi-lingual data with named entity tags. We build on prior work utilizing Wikipedia metadata and show how to effectively combine the weak annotations stemming from Wikipedia metadata with information obtained through English-foreign language parallelWikipedia sentences. The combination is achieved using a novel semi-CRF model for foreign sentence tagging in the context of a parallel English sentence. The model outperforms both standard annotation projection methods and methods based solely on Wikipedia metadata. 0 0
Predicting user tags using semantic expansion Chandramouli K.
Piatrik T.
Izquierdo E.
Communications in Computer and Information Science English 2012 Manually annotating content such as Internet videos, is an intellectually expensive and time consuming process. Furthermore, keywords and community-provided tags lack consistency and present numerous irregularities. Addressing the challenge of simplifying and improving the process of tagging online videos, which is potentially not bounded to any particular domain, we present an algorithm for predicting user-tags from the associated textual metadata in this paper. Our approach is centred around extracting named entities exploiting complementary textual resources such as Wikipedia and Wordnet. More specifically to facilitate the extraction of semantically meaningful tags from a largely unstructured textual corpus we developed a natural language processing framework based on GATE architecture. Extending the functionalities of the in-built GATE named entities, the framework integrates a bag-of-articles algorithm for effectively searching through the Wikipedia articles for extracting relevant articles. The proposed framework has been evaluated against MediaEval 2010 Wild Wild Web dataset, which consists of large collection of Internet videos. 0 0
PythiaSearch - A multiple search strategy-supportive multimedia retrieval system Zellhofer D.
Bertram M.
Bottcher T.
Schmidt C.
Tillmann C.
Schmitt I.
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR 2012 English 2012 PythiaSearch is a multimedia information retrieval system supporting multiple search strategies. Based on the promising results of the underlying query model in 2011's Image-CLEF Wikipedia task, we have implemented an interactive retrieval system which supports multimodal data such as images, (multilingual) texts, and various metadata formats that can be used to query or browse a collection. The support of multiple search strategies is crucial, because it is subject to change during the user's interaction with the retrieval system. The directed search and browsing mechanisms rely both on the same formal query model providing a seamless adaption to the user's search strategy. Additionally, it features a relevance feedback process that can be used to adjust or even learn specific queries based on the user's interaction with the system alone which can be saved for later usage. Copyright 0 0
REWOrD: Semantic relatedness in the web of data Pirro G. Proceedings of the National Conference on Artificial Intelligence English 2012 This paper presents REWOrD, an approach to compute semantic relatedness between entities in the Web of Data representing real word concepts. REWOrD exploits the graph nature of RDF data and the SPARQL query language to access this data. Through simple queries, REWOrD constructs weighted vectors keeping the informativeness of RDF predicates used to make statements about the entities being compared. The most informative path is also considered to further refine informativeness. Relatedness is then computed by the cosine of the weighted vectors. Differently from previous approaches based on Wikipedia, REWOrD does not require any preprocessing or custom data transformation. Indeed, it can leverage whatever RDF knowledge base as a source of background knowledge. We evaluated REWOrD in different settings by using a new dataset of real word entities and investigate its flexibility. As compared to related work on classical datasets, REWOrD obtains comparable results while, on one side, it avoids the burden of preprocessing and data transformation and, on the other side, it provides more flexibility and applicability in a broad range of domains. Copyright © 2012, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
Scientific cyberlearning resources referential metadata creation via information retrieval Xiaojiang Liu
Jia H.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries English 2012 The goal of this research is to describe an innovative method of creating scientific referential metadata for a cyberinfrastructure-enabled learning environment to enhance student and scholar learning experiences. By using information retrieval and meta-search approaches, different types of referential metadata, such as related Wikipedia Pages, Datasets, Source Code, Video Lectures, Presentation Slides, and (online) Tutorials, for an assortment of publications and scientific topics will be automatically retrieved, associated, and ranked. 0 0
Uncover what you see in your images: The infoalbum approach Karlsen R.
Jakobsen B.
Asp Hansen R.B.
International Journal of Computer Science and Applications English 2012 This paper presents InfoAlbum, a novel prototype for image centric information collection, where the goal is to automatically provide the user with information about i) the object or event depicted in an image, and ii) the location where the image was taken. The system aims at improving the image viewing experience by presenting supplementary information such as location names, tags, weather condition at image capture time, placement on map, geographically nearby images, Wikipedia articles and web pages. The information is automatically collected from various sources on the Internet based on the image metadata gps latitude/longitude values, date/time of image capture and a category keyword provided by the user. Collected information is presented to the user, and can also be stored and later used during image retrieval. 0 0
WREF 2012: OPENEI - An open energy data and information exchange for international audiences Brodt-Giles D. World Renewable Energy Forum, WREF 2012, Including World Renewable Energy Congress XII and Colorado Renewable Energy Society (CRES) Annual Conferen English 2012 Designed to be the world's most comprehensive, open, and collaborative energy information network, Open Energy Information (OpenEI - openei.org) supplies essential energy data to decision makers and supports a global energy transformation. The platform, sponsored by the U.S. Department of Energy (DOE) and developed by the National Renewable Energy Laboratory (NREL), is intended for global contribution and collaboration. Energy information and data are available on the Internet already, but the resources are dispersed, published in disparate formats, highly variable in quality and usefulness, and difficult to find. OpenEI provides a solution. The open-source Web platform-similar to the Wikipedia platform-provides more than 800 energy data sets and 55,000 pages of information, analyses, tools, images, maps, and other resources in a completely searchable and editable format. OpenEI's international user base spans more than 200 countries. Because the platform is so interactive and easy to contribute to, content is growing daily. Copyright 0 0
Wedata: A wiki system for service oriented tiny code sharing Kouichirou Eto
Masahiro Hamasaki
Hideaki Takeda
WikiSym 2012 English 2012 A new trend for applications for the Internet is to create and share tiny codes for applications like site-specific codes for a broswer extension. It needs a new tools to share and distribute such codes efficiently. We built a Wiki site called Wedata which stores tiny code for a particular service. Wedata has three features: machine readability, code sharing, and service orientation. Many developers already use Wedata for browser extensions. More than 1,300,000 users are using Wedata. As described in this paper, we describe the Wedata system, usage statistics, and the behavior of open collaboration on the system. 0 0
Wikis: Transactive memory systems in digital form Jackson P. ACIS 2012 : Proceedings of the 23rd Australasian Conference on Information Systems English 2012 Wikis embed information about authors, tags, hyperlinks and other metadata into the information they create. Wiki functions use this metadata to provide pointers which allow users to track down, or be informed of, the information they need. In this paper we provide a firm theoretical conceptualization for this type of activity by showing how this metadata provides a digital foundation for a Transactive Memory System (TMS). TMS is a construct from group psychology which defines directory-based knowledge sharing processes to explain the phenomenon of "group mind". We analyzed the functions and data of two leading Wiki products to understand where and how they support the TMS. We then modeled and extracted data from these products into a network analysis product. The results confirmed that Wikis are a TMS in digital form. Network analysis highlights its characteristics as a "knowledge map", suggesting useful extensions to the internal "TMS" functions of Wikis. Jackson 0 0
YouCat : Weakly supervised youtube video categorization system from meta data & user comments using wordnet & wikipedia Saswati Mukherjee
Prantik Bhattacharyya
24th International Conference on Computational Linguistics - Proceedings of COLING 2012: Technical Papers English 2012 In this paper, we propose a weakly supervised system, YouCat, for categorizing Youtube videos into different genres like Comedy, Horror, Romance, Sports and Technology The system takes a Youtube video url as input and gives it a belongingness score for each genre. The key aspects of this work can be summarized as: (1) Unlike other genre identification works, which are mostly supervised, this system is mostly unsupervised, requiring no labeled data for training. (2) The system can easily incorporate new genres without requiring labeled data for the genres. (3) YouCat extracts information from the video title, meta description and user comments (which together form the video descriptor). (4) It uses Wikipedia and WordNet for concept expansion. (5) The proposed algorithm with a time complexity of O( 0 0
Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features B. Thomas Adler
Luca de Alfaro
Santiago M. Mola Velasco
Paolo Rosso
Andrew G. West
Lecture Notes in Computer Science English February 2011 Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating vandalism in the complete set of Wikipedia revisions. 0 1
A category-driven approach to deriving domain specific subset of Wikipedia Korshunov A.
Denis Turdakov
Jeong J.
Lee M.
Moon C.
CEUR Workshop Proceedings English 2011 While many researchers attempt to build up different kinds of ontologies by means of Wikipedia, the possibility of deriving high-quality domain specific subset of Wikipedia using its own category structure still remains undervalued. We prove the necessity of such processing in this paper and also propose an appropriate technique. As a result, the size of knowledge base for our text processing framework has been reduced by more than order, while the precision of disambiguating musical metadata (ID3 tags) has decreased from 98% to 64%. 0 0
Collaborative management of business metadata Huner K.M.
Boris Otto
Osterle H.
International Journal of Information Management English 2011 Legal provisions, cross-company data exchange and intra-company reporting or planning procedures require comprehensively, timely, unambiguously and understandably specified business objects (e.g. materials, customers, and suppliers). On the one hand, this business metadata has to cover miscellaneous regional peculiarities in order to enable business activities anywhere in the world. On the other hand, data structures need to be standardized throughout the entire company in order to be able to perform global spend analysis, for example. In addition, business objects should adapt to new market conditions or regulatory requirements as quickly and consistently as possible. Centrally organized corporate metadata managers (e.g. within a central IT department) are hardly able to meet all these demands. They should be supported by key users from several business divisions and regions, who contribute expert knowledge. However, despite the advantages regarding high metadata quality on a corporate level, a collaborative metadata management approach of this kind has to ensure low effort for knowledge contributors as in most cases these regional or divisional experts do not benefit from metadata quality themselves. Therefore, the paper at hand identifies requirements to be met by a business metadata repository, which is a tool that can effectively support collaborative management of business metadata. In addition, the paper presents the results of an evaluation of these requirements with business experts from various companies and of scenario tests with a wiki-based prototype at the company Bayer CropScience AG. The evaluation shows two things: First, collaboration is a success factor when it comes to establishing effective business metadata management and integrating metadata with enterprise systems, and second, semantic wikis are well suited to realizing business metadata repositories. 0 0
Content-based recommendation algorithms on the Hadoop mapreduce framework De Pessemier T.
Vanhecke K.
Dooms S.
Martens L.
WEBIST 2011 - Proceedings of the 7th International Conference on Web Information Systems and Technologies English 2011 Content-based recommender systems are widely used to generate personal suggestions for content items based on their metadata description. However, due to the required (text) processing of these metadata, the computational complexity of the recommendation algorithms is high, which hampers their application in large-scale. This computational load reinforces the necessity of a reliable, scalable and distributed processing platform for calculating recommendations. Hadoop is such a platform that supports data-intensive distributed applications based on map and reduce tasks. Therefore, we investigated how Hadoop can be utilized as a cloud computing platform to solve the scalability problem of content-based recommendation algorithms. The various MapReduce operations, necessary for keyword extraction and generating content-based suggestions for the end-user, are elucidated in this paper. Experimental results on Wikipedia articles prove the appropriateness of Hadoop as an efficient and scalable platform for computing content-based recommendations. 0 0
Detecting the long-tail of points of interest in tagged photo collections Zigkolis C.
Papadopoulos S.
Kompatsiaris Y.
Athena Vakali
Proceedings - International Workshop on Content-Based Multimedia Indexing English 2011 The paper tackles the problem of matching the photos of a tagged photo collection to a list of "long-tail" Points Of Interest (PoIs), that is PoIs that are not very popular and thus not well represented in the photo collection. Despite the significance of improving "long-tail" PoI photo retrieval for travel applications, most landmark detection methods to date have been tested on very popular landmarks. In this paper, we conduct a thorough empirical analysis comparing four baseline matching methods that rely on photo metadata, three variants of an approach that uses cluster analysis in order to discover PoI-related photo clusters, and a real-world retrieval mechanism (Flickr search) on a set of less popular PoIs. A user-based evaluation of the aforementioned methods is conducted on a Flickr photo collection of over 100, 000 photos from 10 well-known touristic destinations in Greece. A set of 104 "long-tail" PoIs is collected for these destinations from Wikipedia, Wikimapia and OpenStreetMap. The results demonstrate that two of the baseline methods outperform Flickr search in terms of precision and F-measure, whereas two of the cluster-based methods outperform it in terms of recall and PoI coverage. We consider the results of this study valuable for enhancing the indexing of pictorial content in social media sites. 0 0
Evaluating a semantic network automatically constructed from lexical co-occurrence on a word sense disambiguation task Szumlanski S.
Gomez F.
CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference English 2011 We describe the extension and objective evaluation of a network1 of semantically related noun senses (or concepts) that has been automatically acquired by analyzing lexical co occurrence in Wikipedia. The acquisition process makes no use of the metadata or links that have been manually built into the encyclopedia, and nouns in the network are automatically disambiguated to their corresponding noun senses without supervision. For this task, we use the noun sense inventory of Word Net 3.0. Thus, this work can be conceived of as augmenting the Word Net noun ontology with un weighted, undirected related to edges between synsets. Our network contains 208,832 such edges. 0 0
External query reformulation for text-based image retrieval Min J.
Jones G.J.F.
Lecture Notes in Computer Science English 2011 In text-based image retrieval, the Incomplete Annotation Problem (IAP) can greatly degrade retrieval effectiveness. A standard method used to address this problem is pseudo relevance feedback (PRF) which updates user queries by adding feedback terms selected automatically from top ranked documents in a prior retrieval run. PRF assumes that the target collection provides enough feedback information to select effective expansion terms. This is often not the case in image retrieval since images often only have short metadata annotations leading to the IAP. Our work proposes the use of an external knowledge resource (Wikipedia) in the process of refining user queries. In our method, Wikipedia documents strongly related to the terms in user query ("definition documents") are first identified by title matching between the query and titles of Wikipedia articles. These definition documents are used as indicators to re-weight the feedback documents from an initial search run on a Wikipedia abstract collection using the Jaccard coefficient. The new weights of the feedback documents are combined with the scores rated by different indicators. Query-expansion terms are then selected based on these new weights for the feedback documents. Our method is evaluated on the ImageCLEF WikipediaMM image retrieval task using text-based retrieval on the document metadata fields. The results show significant improvement compared to standard PRF methods. 0 0
How to teach digital library data to swim into research Schindler C.
Cornelia Veja
Marc Rittberger
Vrandecic D.
ACM International Conference Proceeding Series English 2011 Virtual research environments (VREs) aim to enhance research practice and have been identified as drivers for changes in libraries. This paper argues that VREs in combination with Semantic Web technologies offer a range of possibilities to align research with library practices. This main claim of the article is exemplified by a metadata integration process of bibliographic data from libraries to a VRE which is based on Semantic MediaWiki. The integration process rests on three pillars: MediaWiki as a web-based repository, Semantic MediaWiki annotation mechanisms, and semi-automatic workflow management for the integration of digital resources. Thereby, needs of scholarly research practices and capacities for interactions are taken into account. The integration process is part of the design of Semantic MediaWiki for Collaborative Corpora Analysis (SMW-CorA) which uses a concrete research project in the history of education as a reference point for an infrastructural distribution. Semantic MediaWiki thus provides a light-weight environment offering a framework for re-using heterogeneous resources and a flexible collaborative way of conducting research. 0 0
Language of vandalism: Improving Wikipedia vandalism detection via stylometric analysis Manoj Harpalani
Michael Hart
Sandesh Singh
Rob Johnson
Yejin Choi
ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies English 2011 Community-based knowledge forums, such as Wikipedia, are susceptible to vandalism, i.e., ill-intentioned contributions that are detrimental to the quality of collective intelligence. Most previous work to date relies on shallow lexico-syntactic patterns and metadata to automatically detect vandalism in Wikipedia. In this paper, we explore more linguistically motivated approaches to vandalism detection. In particular, we hypothesize that textual vandalism constitutes a unique genre where a group of people share a similar linguistic behavior. Experimental results suggest that (1) statistical models give evidence to unique language styles in vandalism, and that (2) deep syntactic patterns based on probabilistic context free grammars (PCFG) discriminate vandalism more effectively than shallow lexicosyntactic patterns based on n-grams. 0 0
Leveraging wikipedia characteristics for search and candidate generation in question answering Chu-Carroll J.
Fan J.
Proceedings of the National Conference on Artificial Intelligence English 2011 Most existing Question Answering (QA) systems adopt a type-and-generate approach to candidate generation that relies on a pre-defined domain ontology. This paper describes a type independent search and candidate generation paradigm for QA that leverages Wikipedia characteristics. This approach is particularly useful for adapting QA systems to domains where reliable answer type identification and type-based answer extraction are not available. We present a three-pronged search approach motivated by relations an answer-justifying title-oriented document may have with the question/answer pair. We further show how Wikipedia metadata such as anchor texts and redirects can be utilized to effectively extract candidate answers from search results without a type ontology. Our experimental results show that our strategies obtained high binary recall in both search and candidate generation on TREC questions, a domain that has mature answer type extraction technology, as well as on Jeopardy! questions, a domain without such technology. Our high-recall search and candidate generation approach has also led to high over-all QA performance in Watson, our end-to-end system. Copyright © 2011, Association for the Advancement of Artificial Intelligence. All rights reserved. 0 0
Metadata enrichment via topic models for author name disambiguation Bernardi R.
Le D.-T.
Lecture Notes in Computer Science English 2011 This paper tackles the well known problem of Author Name Disambiguation (AND) in Digital Libraries (DL). Following [14,13], we assume that an individual tends to create a distinctively coherent body of work that can hence form a single cluster containing all of his/her articles yet distinguishing them from those of everyone else with the same name. Still, we believe the information contained in a DL may be not sufficient to allow an automatic detection of such clusters; this lack of information becomes even more evident in federated digital libraries, where the labels assigned by librarians may belong to different controlled vocabularies or different classification systems, and in digital libraries on the web where records may be not assigned neither subject headings nor classification numbers. Hence, we exploit Topic Models, extracted from Wikipedia, to enhance records metadata and use Agglomerative Clustering to disambiguate ambiguous author names by clustering together similar records; records in different clusters are supposed to have been written by different people. We investigate the following two research questions: (a) are the Classification Systems and Subject Heading labels manually assigned by librarians general and informative enough to disambiguate Author Names via clustering techniques? (b) Do Topic Models induce from large corpora the conceptual information necessary for labelling automatically DL metadata and grasp topic similarities of the records? To answer these questions, we will use the Library Catalogue of the Bolzano University Library as case study. 0 0
Query and tag translation for Chinese-Korean cross-language social media retrieval Wang Y.-C.
Chen J.-T.
Tsai R.T.-H.
Hsu W.-L.
Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011 English 2011 Collaborative tagging has been widely adopted by social media websites to allow users to describe content with metadata tags. Tagging can greatly improve search results. We propose a cross-language social media retrieval system (CLSMR) to help users retrieve foreign-language tagged media content. We construct a Chinese to Korean CLSMR system that translates Chinese queries into Korean, retrieves content, and then translates the Korean tags in the search results back into Chinese. Our system translates NEs using a dictionary of bilingual NE pairs from Wikipedia and a pattern-based software translator which learns regular NE patterns from the web. The top-10 precision of YouTube retrieved results for our system was 0.39875. The K-C NE tag translation accuracy for the top-10 YouTube results was 77.6%, which shows that our translation method is fairly effective for named entities. A questionnaire given to users showed that automatically translated tags were considered as informative as a human-written summary. With our proposed CLSMR system, Chinese users can retrieve online Korean media files and get a basic understanding of their content with no knowledge of the Korean language. 0 0
SMASHUP: Secure mashup for defense transformation and net-centric systems Heileman M.D.
Heileman G.L.
Shaver M.P.
Gilger M.
Jamkhedkar P.A.
Proceedings of SPIE - The International Society for Optical Engineering English 2011 The recent development of mashup technologies now enables users to easily collect, integrate, and display data from a vast array of different information sources available on the Internet. The ability to harness and leverage information in this manner provides a powerful means for discovering links between information, and greatly enhances decisionmaking capabilities. The availability of such services in DoD environments will provide tremendous advantages to the decision-makers engaged in analysis of critical situations, rapid-response, and long-term planning scenarios. However in the absence of mechanisms for managing the usage of resources, any mashup service in a DoD environment also opens up significant security vulnerabilities to insider threat and accidental leakage of confidential information, not to mention other security threats. In this paper we describe the development of a framework that will allow integration via mashups of content from various data sources in a secure manner. The framework is based on mathematical logic where addressable resources have formal usage terms applied to them, and these terms are used to specify and enforce usage policies over the resources. An advantage of this approach is it provides a formal means for securely managing the usage of resources that might exist within multilevel security environments. 0 0
The InfoAlbum image centric information collection Karlsen R.
Jakobsen B.
ACM International Conference Proceeding Series English 2011 This paper presents a prototype of an image centric information album, where the goal is to automatically provide the user with information about i) the object or event depicted in an image, and ii) the surrounding where the image was taken. The system, called InfoAlbum, aims at improving the image viewing experience by presenting supplementary information such as location names, tags, temperature at image capture time, placement on map, geographically nearby images, Wikipedia articles and web pages. The information is automatically collected from various sources on the Internet based on the image metadata gps coordinates, date/time of image capture and a category keyword provided by the user. Collected information is presented to the user, and some is also stored in the EXIF header of the image and can later be used during image retrieval. 0 0
Toward a semantic vocabulary for systems engineering Di Maio P. ACM International Conference Proceeding Series English 2011 The web can be the most efficient medium for sharing knowledge, provided appropriate technological artifacts such as controlled vocabularies and metadata are adopted. In our research we study the degree of such adoption applied to the systems engineering domain. This paper is a work in progress report discussing issues surrounding knowledge extraction and representation, proposing an integrated approach to tackle various challenges associated with the development of a shared vocabulary for the practice. 0 0
Trial integration of agricultural field sensing data Kiura T.
Katsumi Tanaka
Omine M.
Yoshida T.
Proceedings of the SICE Annual Conference English 2011 MetBroker, virtually integrates meteorological data from different sources and access methods, was extended a web ontology (metbroker.owl) to provide flexible data retrieval at the primary stage, and then to utilize the OWL for data integration itself. As the first trial, we used MetBroker to integrate meteorological data part from the field sensing data by Field Servers, and found that we successfully integrate Field Server data with other meteorological data. We expected that we can integrate meteorological data from other field sensing data sources to MetBroker. But we found that there is no OWL for other data obtained from field sensing data and observation data. To solve this issue, we start the second trial integration, to identify the relationships between terms used in metadata, create an extended XML schema for data exchange based on existed standard. The details of our trials are described. 0 0
Wikipedia vandalism detection: Combining natural language, metadata, and reputation features Adler B.T.
Luca de Alfaro
Mola-Velasco S.M.
Paolo Rosso
West A.G.
Lecture Notes in Computer Science English 2011 Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating vandalism in the complete set of Wikipedia revisions. 0 1
DIY eBooks: Collaborative publishing made easy Battle S.
Fabio Vitali
Angelo Di Iorio
Bernius M.
Henderson T.
Choudhury M.
Proceedings of SPIE - The International Society for Optical Engineering English 2010 Print is undergoing a revolution as significant as the invention of the printing press. The emergence of ePaper is a major disruption for the printing industry; defining a new medium with the potential to redefine publishing in a way that is as different to today's Web, as the Web is to traditional print. In this new eBook ecosystem we don't just see users as consumers of eBooks, but as active prosumers able to collaboratively create, customize and publish their own eBooks. We describe a transclusive, collaborative publishing framework for the web. 0 0
Document expansion for text-based image retrieval at CLEF 2009 Min J.
Wilkins P.
Johannes Leveling
Jones G.J.F.
Lecture Notes in Computer Science English 2010 In this paper, we describe and analyze our participation in the WikipediaMM task at CLEF 2009. Our main efforts concern the expansion of the image metadata from the Wikipedia abstracts collection - DBpedia. In our experiments, we use the Okapi feedback algorithm for document expansion. Compared with our text retrieval baseline, our best document expansion RUN improves MAP by 17.89%. As one of our conclusions, document expansion from external resource can play an effective factor in the image metadata retrieval task. 0 0
Efficient visualization of content and contextual information of an online multimedia digital library for effective browsing Mishra S.
Gorai A.
Oberoi T.
Ghosh H.
Proceedings - 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2010 English 2010 In this paper, we present a few innovative techniques for visualization of content and contextual information of a multimedia digital library for effective browsing. A traditional collection visualization portal often depicts some metadata or a short synopsis, which is quite inadequate for assessing the documents. We have designed a novel web portal that incorporates a few preview facilities to disclose an abstract of the contents. Moreover, we place the documents on Google Maps to make its geographical context explicit. A semantic network, created automatically around the collection, brings out other contextual information from external knowledge resources like Wikipedia which is used for navigating collection. This paper also reports economical hosting techniques using Amazon Cloud. 0 0
Electronic laboratory books in fusion experiments and engineering Landgraf B.
Kramer-Flecken A.
Fusion Engineering and Design English 2010 In this work we introduce eLaBo-an electronic laboratory book system. ELaBo is a tool that enables collaboration of distributed teams by using standard internet browsers. It provides several functions: Users can create books for specific purposes, e.g. an experimental session or for recording on diagnostics. A book contains pages and resources (e.g. binary files), which are created and manipulated by users of the book. A simple WIKI syntax is used to edit the contents of pages including formatted text, images, and LaTeX for expressing mathematical equations. ELaBo provides for different types of links, a full-text search for the WIKI pages and a version history. Access control is implemented using a key methaphor. Recently (since the last login) modified or created pages or books can be displayed on demand. © 2010 Elsevier B.V. All rights reserved. 0 0
Internet, archaeology on Richards J.D. Encyclopedia of Archaeology English 2010 Archaeological adoption of the Internet is considered in the context of broader trends in scholarly communication and e-commerce. The impact of electronic publication on archaeology and the growth of e-journals is discussed. The deep web, and the rich content available from on-line databases and other web resources is also considered. The transience of the web, and problems of digital preservation is addressed. The use of the internet for more community based interactions and the growth of internet discussion groups, web blogs, and news feeds, is also discussed. The Internet has been seen as a great democratiser of archaeological knowledge, but others claim it creates a new technocractic elite, and restricts access to the developed world. Finally, the problem of resource discovery is raised, and the difficulties of finding authoritative information. Does the future lie in greater adoption of metadata standards and the development of the semantic web, or does Google have all the answers? © 2008 Copyright © 2008 Elsevier Inc. All rights reserved. 0 0
Meta-metadata: A metadata semantics language for collection representation applications Kerne A.
Qu Y.
Webb A.M.
Damaraju S.
Lupfer N.
Mathur A.
International Conference on Information and Knowledge Management, Proceedings English 2010 Collecting, organizing, and thinking about diverse information resources is the keystone of meaningful digital information experiences, from research to education to leisure. Metadata semantics are crucial for organizing collections, yet their structural diversity exacerbates problems of obtaining and manipulating them, strewing end users and application developers amidst the shadows of a proverbial tower of Babel. We introduce meta-metadata, a language and software architecture addressing a metadata semantics lifecycle: (1) data structures for representation of metadata in programs; (2) metadata extraction from information resources; (3) semantic actions that connect metadata to collection representation applications; and (4) rules for presentation to users. The language enables power users to author metadata semantics wrappers that generalize template-based information sources. The architecture supports development of independent collection representation applications that reuse wrappers. The initial meta-metadata repository of information source wrappers includes Google, Flickr, Yahoo, IMDb, Wikipedia, and the ACM Portal. Case studies validate the approach. 0 0
Metadata for WICRI, a network of semantic wikis for communities in research and innovation Ducloy J.
Daunois T.
Foulonneau M.
Hermann A.
Lamirel J.-C.
Sire S.
Thomesse J.-P.
Vanoirbeek C.
Proceedings of the International Conference on Dublin Core and Metadata Applications English 2010 This paper introduces metadata issues in the framework of the WICRI project, a network of semantic wikis for communities in research and innovation, in which a wiki can be related to an institution, a research field or a regional entity. Metadata and semantic items play the strategic role to handle the quality and the consistency of the network, that must deal with the wiki way of workingin which a metadata specialist and a scientist can work altogether, at the same time, on the same pages. Some first experiments of designing metadata are presented. A wiki, encyclopedia of metadata, is proposed, and related technical issues are discussed. 0 0
Metadata repository management using the MediaWiki interoperability framework a case study: The keytonature project Veja C.F.M.
Gregor Hagedorn
Gisela Weber
Mircea Giurgiu
EChallenges e-2010 Conference English 2010 In the KeyToNature project a user-centred and collaborative approach for metadata repository management was developed. KeyToNature is an EU project to enhance the knowledge of biodiversity by improving the availability of digital and non-digital media along with digital tools for the identification of living organisms throughout Europe. To improve the ability to search and access information, metadata are provided and integrated into a metadata repository. This paper presents a method utilizing web-based MediaWiki system as part of a low-tech interoperability and repository layer for data providers, end users, developers, and project partners. Because the level of technological expertise of the data providers varies greatly, a solution accessible for non-expert data providers was developed. The main features of this method are the automatic metadata repository management, and an ontological approach with ingestion workflows integrated into MediaWiki collaborative framework. Extensive user testing shows performance advantages of the method and attests usefulness in the application area. This practice-oriented method can be adopted by other projects aiming at collaborative knowledge acquisition and automatic metadata repository management, regardless of domain of discourse. Copyright 0 0
Not so creepy crawler: Easy crawler generation with standard XML queries Von Dem Bussche F.
Weiand K.
Linse B.
Furche T.
Bry F.
Proceedings of the 19th International Conference on World Wide Web, WWW '10 English 2010 Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far more uniformly structured than in the general Web and thus crawlers can use the structure of Web pages for more precise data extraction and more expressive analysis. In this demonstration, we present a focused, structure-based crawler generator, the "Not so Creepy Crawler" (nc2 ). What sets nc2 apart, is that all analysis and decision tasks of the crawling process are delegated to an (arbitrary) XML query engine such as XQuery or Xcerpt. Customizing crawlers just means writing (declarative) XML queries that can access the currently crawled document as well as the metadata of the crawl process. We identify four types of queries that together sufice to realize a wide variety of focused crawlers. We demonstrate nc2 with two applications: The first extracts data about cities from Wikipedia with a customizable set of attributes for selecting and reporting these cities. It illustrates the power of nc2 where data extraction from Wiki-style, fairly homogeneous knowledge sites is required. In contrast, the second use case demonstrates how easy nc2 makes even complex analysis tasks on social networking sites, here exemplified by last.fm. 0 0
Overview of VideoCLEF 2009: New perspectives on speech-based multimedia content enrichment Larson M.
Newman E.
Jones G.J.F.
Lecture Notes in Computer Science English 2010 VideoCLEF 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment. For each task, video data (Dutch-language television, predominantly documentaries) accompanied by speech recognition transcripts were provided. The Subject Classification Task involved automatic tagging of videos with subject theme labels. The best performance was achieved by approaching subject tagging as an information retrieval task and using both speech recognition transcripts and archival metadata. Alternatively, classifiers were trained using either the training data provided or data collected from Wikipedia or via general Web search. The Affect Task involved detecting narrative peaks, defined as points where viewers perceive heightened dramatic tension. The task was carried out on the "Beeldenstorm" collection containing 45 short-form documentaries on the visual arts. The best runs exploited affective vocabulary and audience directed speech. Other approaches included using topic changes, elevated speaking pitch, increased speaking intensity and radical visual changes. The Linking Task, also called "Finding Related Resources Across Languages," involved linking video to material on the same subject in a different language. Participants were provided with a list of multimedia anchors (short video segments) in the Dutch-language "Beeldenstorm" collection and were expected to return target pages drawn from English-language Wikipedia. The best performing methods used the transcript of the speech spoken during the multimedia anchor to build a query to search an index of the Dutch-language Wikipedia. The Dutch Wikipedia pages returned were used to identify related English pages. Participants also experimented with pseudo-relevance feedback, query translation and methods that targeted proper names. 0 0
STiki: An anti-vandalism tool for wikipedia using spatio-temporal analysis of revision metadata West A.G.
Sampath Kannan
Insup Lee
WikiSym 2010 English 2010 STiki is an anti-vandalism tool for Wikipedia. Unlike similar tools, STiki does not rely on natural language processing (NLP) over the article or diff text to locate vandalism. Instead, STiki leverages spatio-temporal properties of revision metadata. The feasibility of utilizing such properties was demonstrated in our prior work, which found they perform comparably to NLP-efforts while being more efficient, robust to evasion, and language independent. STiki is a real-time, on-Wikipedia implementation based on these properties. It consists of, (1) a server-side processing engine that examines revisions, scoring the likelihood each is vandalism, and, (2) a client-side GUI that presents likely vandalism to end-users for definitive classification (and if necessary, reversion on Wikipedia). Our demonstration will provide an introduction to spatio-temporal properties, demonstrate the STiki software, and discuss alternative research uses for the open-source code. 0 0
Semantic MediaWiki interoperability framework from a semantic social software perspective Cornelia Veja
Mircea Giurgiu
Gregor Hagedorn
Gisela Weber
2010 9th International Symposium on Electronics and Telecommunications, ISETC'10 - Conference Proceedings English 2010 This paper presents two collaborative Social-Software-driven approaches for the interoperability of multimedia resources used in KeyToNature project. The first approach, using MediaWiki as a low level interoperability framework is presented in our previous works. The second one, Semantic MediaWiki interoperability framework for multimedia resources is presented in this paper, and is still in progress. We are arguing that different approaches are needed, depending on the context and intention of multimedia resource use. 0 0
Semantic need: Guiding metadata annotations by questions people Happel H.-J. Lecture Notes in Computer Science English 2010 In its core, the Semantic Web is about the creation, collection and interlinking of metadata on which agents can perform tasks for human users. While many tools and approaches support either the creation or usage of semantic metadata, there is neither a proper notion of metadata need, nor a related theory of guidance which metadata should be created. In this paper, we propose to analyze structured queries to help identifying missing metadata. We conduct a study on Semantic MediaWiki (SMW), one of the most popular Semantic Web applications to date, analyzing structured "ask"-queries in public SMW instances. Based on that, we describe Semantic Need, an extension for SMW which guides contributors to provide semantic annotations, and summarize feedback from an online survey among 30 experienced SMW users. 0 0
Spatio-temporal analysis of wikipedia metadata and the STiki anti-vandalism tool West A.G.
Sampath Kannan
Insup Lee
WikiSym 2010 English 2010 The bulk of Wikipedia anti-vandalism tools require natural language processing over the article or diff text. However, our prior work demonstrated the feasibility of using spatio-temporal properties to locate malicious edits. STiki is a real-time, on-Wikipedia tool leveraging this technique. The associated poster reviews STiki's methodology and performance. We find competing anti-vandalism tools inhibit maximal performance. However, the tool proves particularly adept at mitigating long-term embedded vandalism. Further, its robust and language-independent nature make it well-suited for use in less-patrolled Wiki installations. 0 0
UNIpedia: A unified ontological knowledge platform for semantic content tagging and search Kalender M.
Dang J.
Uskudarli S.
Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010 English 2010 The emergence of an ever increasing number of documents makes it more and more difficult to locate them when desired. An approach for improving search results is to make use of user-generated tags. This approach has led to improvements. However, they are limited because tags are (1) free from context and form, (2) user generated, (3) used for purposes other than description, and (4) often ambiguous. As a formal, declarative knowledge representation model, Ontologies provide a foundation upon which machine understandable knowledge can be obtained and tagged, and as a result, it makes semantic tagging and search possible. With an ontology, semantic web technologies can be utilized to automatically generate semantic tags. WordNet has been used for this purpose. However, this approach falls short in tagging documents that refer to new concepts and instances. To address this challenge, we present UNIpedia - a platform for unifying different ontological knowledge bases by reconciling their instances as WordNet concepts. Our mapping algorithms use rule based heuristics extracted from ontological and statistical features of concept and instances. UNIpedia is used to semantically tag contemporary documents. For this purpose, the Wikipedia and OpenCyc knowledge bases, which are known to contain up to date instances and reliable metadata about them, are selected. Experiments show that the accuracy of the mapping between WordNet and Wikipedia is 84% for the most relevant concept name and 90% for the appropriate sense. 0 0
User-contributed descriptive metadata for libraries and cultural institutions Zarro M.A.
Allen R.B.
Lecture Notes in Computer Science English 2010 The Library of Congress and other cultural institutions are collecting highly informative user-contributed metadata as comments and notes expressing historical and factual information not previously identified with a resource. In this observational study we find a number of valuable annotations added to sets of images posted by the Library of Congress on the Flickr Commons. We propose a classification scheme to manage contributions and mitigate information overload issues. Implications for information retrieval and search are discussed. Additionally, the limits of a "collection" are becoming blurred as connections are being built via hyperlinks to related resources outside of the library collection, such as Wikipedia and locally relevant websites. Ideas are suggested for future projects, including interface design and institutional use of user-contributed information. 0 0
Using semantic Wikis as collaborative tools for geo-ontology Li Q.
Wang J.
Hua Li
2010 18th International Conference on Geoinformatics, Geoinformatics 2010 English 2010 As ontology has become a convenient vehicle for domain knowledge and metadata, it is used for realizing information sharing at semantic level in geoscience. Building geo-ontology is a systematic engineering and requires collaborative work, while there is a lack of ontology edit tools supporting collaborative work. Since Wikis are cooperative tools for easy writing and sharing of content and semantic Wikis are the Wikis improved with Semantic Web for representing semantic information, we propose to use semantic Wikis as cooperative tools for building geo-ontology. An architecture similar as semantic Wiki for geo-ontology editing is presented. As well as the semantic hierarchy of geo-information and the evolvement mechanism of geo-ontology in it are presented. The usefulness of the approach is demonstrated by a small case study. 0 0
Wikipedia based semantic metadata annotation of audio transcripts Paci G.
Pedrazzi G.
Turra R.
11th International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 10 English 2010 A method to automatically annotate video items with semantic metadata is presented. The method has been developed in the context of the Papyrus project to annotate documentary-like broadcast videos with a set of relevant keywords using automatic speech recognition (ASR) transcripts as a primary complementary resource. The task is complicated by the high word error rate (WER) of the ASR for this kind of videos. For this reason a novel relevance criterion based on domain information is proposed. Wikipedia is used both as a source of metadata and as a linguistic resource for disambiguating keywords and for eliminating the out of topic/out of domain keywords. Documents are annotated with relevant links toWikipedia pages, concepts definitions, synonyms, translations and concepts categories. 0 0
Wikipedia-based online celebrity recognition Lin D.
Jin J.
Xiong Y.
HP Laboratories Technical Report English 2010 In this paper, a Wikipedia-based online celebrity recognition scheme is presented. The celebrity base, which includes personal metadata and personal tags, is constructed from Wikipedia. Celebrity recognition service is provided to recognize celebrities in articles based on the celebrity base. Two simple demos are introduced to show the potential usage of celebrity recognition for personalized recommendation and smart browsing. 0 0
Writeslike.us: Linking people through OAI Metadata Tonkin E. ELPUB 2010 - Publishing in the Networked World: Transforming the Nature of Communication, 14th International Conference on Electronic Publishing English 2010 Informal scholarly communication is an important aspect of discourse both within research communities and in dissemination and reuse of data and findings. Various tools exist that are designed to facilitate informal communication between researchers, such as social networking software, including those dedicated specifically for academics. Others make use of existing information sources, in particular structured information such as social network data (e.g. FOAF) or bibliographic data, in order to identify links between individuals; co-authorship, membership of the same organisation, attendance at the same conferences, and so forth. Writeslike.us is a prototype designed to support the aim of establishing informal links between researchers. It makes use of data harvested from OAI repositories as an initial resource. This raises problems less evident in the use of more consistently structured data. The information extracted is filtered using a variety of processes to identify and benefit from systematic features in the data. Following this, the record is analysed for subject, author name, and full text link or source; this is spidered to extract full text, where available, to which is applied a formal metadata extraction package, extracting several relevant features ranging from document format to author email address/citations. The process is supported using data from Wikipedia. Once available, this information may be explored using both graph and matrix-based approaches; we present a method based on spreading activation energy, and a similar mechanism based on cosine similarity metrics. A number of prototype interfaces/data access methods are described, along with relevant use cases, in this paper. 0 0
Xerox trails: A new web-based publishing technology Rao V.G.
Vandervort D.
Silverstein J.
Proceedings of SPIE - The International Society for Optical Engineering English 2010 Xerox Trails is a new digital publishing model developed at the Xerox Research Center, Webster. The primary purpose of the technology is to allow Web users and publishers to collect, organize and present information in the form of a useful annotated narrative (possibly non-sequential) with editorial content and metadata, that can be consumed both online and offline. The core concept is a trail: a digital object that improves online content production, consumption and navigation user experiences. When appropriate, trails can also be easily sequenced and transformed into printable documents, thereby bridging the gap between online and offline content experiences. The model is partly inspired by Vannevar Bush's influential idea of the "Memex" [1] which has inspired several generations of Web technology [2]. Xerox Trails is a realization of selected elements from the idea of the Memex, along with several original design ideas. It is based on a primitive data construct, the trail. In Xerox Trails, the idea of a trail is used to support the architecture of a Web 2.0 product suite called Trailmeme, that includes a destination Web site, plugins for major content management systems, and a browser toolbar. 0 0
Augmented social cognition: Using social web technology to enhance the ability of groups to remember, think, and reason Chi E.H. SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems English 2009 We are experiencing a new Social Web, where people share, communicate, commiserate, and conflict with each other. As evidenced by systems like Wikipedia, twitter, and delicious.com, these environments are turning people into social information foragers and sharers. Groups interact to resolve conflicts and jointly make sense of topic areas from "Obama vs. Clinton" to "Islam." PARC's Augmented Social Cognition researchers -- who come from cognitive psychology, computer science, HCI, CSCW, and other disciplines -- focus on understanding how to "enhance a group of people's ability to remember, think, and reason". Through Social Web systems like social bookmarking sites, blogs, Wikis, and more, we can finally study, in detail, these types of enhancements on a very large scale. Here we summarize recent work and early findings such as: (1) how conflict and coordination have played out in Wikipedia, and how social transparency might affect reader trust; (2) how decreasing interaction costs might change participation in social tagging systems; and (3) how computation can help organize usergenerated content and metadata. 0 0
Combining unstructured, fully structured and semi-structured information in semantic wikis Sint R.
Sebastian Schaffert
Stroka S.
Ferstl R.
CEUR Workshop Proceedings English 2009 The growing impact of Semantic Wikis deduces the importance of finding a strategy to store textual articles, semantic metadata and management data. Due to their different characteristics, each data type requires a specialized storing system, as inappropriate storing reduces performance, robustness, flexibility and scalability. Hence, it is important to identify a sophisticated strategy for storing and synchronizing different types of data structures in a way they provide the best mix of the previously mentioned properties. In this paper we compare fully structured, semi-structured and unstructured data and present their typical appliance. Moreover, we discuss how all data structures can be combined and stored for one application and consider three synchronization design alternatives to keep the distributed data storages consistent. Furthermore, we present the semantic wiki KiWi, which uses an RDF triplestore in combination with a relational database as basis for the persistence of data, and discuss its concrete implementation and design decisions. 0 0
Customized edit interfaces for wikis via semantic annotations Angelo Di Iorio
Duca S.
Alberto Musetti
Righini S.
Rossi D.
Fabio Vitali
CEUR Workshop Proceedings English 2009 Authoring support for semantic annotations represent the wiki way of the Semantic Web, ultimately leading to the wiki version of the Semantic Web's eternal dilemma: why should authors correctly annotate their content? The obvious solution is to make the ratio between the needed effort and the acquired advantages as small as possible. Two are, at least, the specificities that set wikis apart from other Web-accessible content in this respect: social aspects (wikis are often the expression of a community) and technical issues (wikis are edited "on-line"). Being related to a community, wikis are intrinsically associated to the model of knowledge of that community, making the relation between wiki content and ontologies the result of a natural process. Being edited on-line, wikis can benefit from a synergy of Web technologies that support all the information sharing process, from authoring to delivery. In this paper we present an approach to reduce the authoring effort by providing ontology-based tools to integrate models of knowledge with authoring-support technologies, using a functional approach to content fragment creation that plays nicely with the "wiki way" of managing information. 0 0
Deep thought;web based system for managing and presentation of research and student projects Gregar T.
Pospisilova R.
Pitner T.
CSEDU 2009 - Proceedings of the 1st International Conference on Computer Supported Education English 2009 There are plenty of projects solved each day at academic venues -small in-term students' projects without any real usability, bachelor and diploma thesis, large interdisciplinary or internationally supported projects. Each of them has its own set of requirements how to manage it. Aim of our paper is to describe these requirements, and to show how we tried to satisfy them. As a result of further analysis we designed and implemented system Deep Thought (under development since autumn 2007), which united the management of distinct categories of projects in one portal. System is based on open-source technology, it is modular and hence it is capable to integrate heterogeneous tools such as version control system, wiki, project presenting and managing. This paper also introduces aims of the future development of the system, such as interoperability with other management systems or better connection with the lecture content and teaching process. 0 0
Fritz - Wiki technology for modeling and simulation (M&S) repositories Feinberg J.M.
Misch G.L.
Simulation Interoperability Standards Organization - Spring Simulation Interoperability Workshop 2009 English 2009 Alion Science and Technology recently funded the Wiki Consolidated Knowledge Engineering Development (WiCKED) internal research and development (IR&D) project for developing and applying wiki technology to implement modeling and simulation (M&S) knowledge repositories. Alion has operated managed, maintained, and reviewed DoD M&S repositories for more than ten years, and its scientists and management recognized the potential competitive advantages that could be realized by a new architectural approach. "Fritz" wiki, our initial development based on WiCKED technology, is showing great promise for M&S knowledge management applications. This paper presents and discusses some lessons learned during the initial parts of this effort with wikis, including their basic usefulness, the measures of success, and the potential advantages of lower cost, shorter schedule, and reduced risk. The nature of a wiki is free wheeling. The Wikipedia experience suggests that a wiki is most useful when large numbers of people can contribute to, as well as access, the information. While Wikipedia and similar sites have become excellent online tools, the authoritative nature of their entries can be difficult to ascertain, and the consistency of the metadata for entries is totally lacking. The first problem is compounded in the DoD M&S world by the need to restrict public access, restrict the types of entries, and restrict the contents of entries depending on the desired scope of access. The second problem indicates the need for, and clear benefits resulting from, using a consistent set of metadata in certain types of entries. This presentation utilizes a framework developed in a previous paper for evaluating a repository, including issues such as: what it will contain (scope), who will be allowed access to it, who populates it, who validates the information, and who manages (implements) it? These issues, which require serious thought about access and an evolving editorial policy, will be among those the paper details in light of our IR&D experience with Fritz. 0 0
Lightweight document semantics processing in e-learning Gregar T.
Pitner T.
Proceedings of I-KNOW 2009 - 9th International Conference on Knowledge Management and Knowledge Technologies and Proceedings of I-SEMANTICS 2009 - 5th International Conference on Semantic Systems English 2009 There are plenty of projects aimed at incorporating semantic information into present day document processing. The main problem is their real-world usability. E-learning is one of the areas which can take advantage of the semantically described documents. In this paper we would like to introduce a framework of cooperating tools which can help extract, store, visualize semantics in this area. 0 0
Metadata and multilinguality in video classification He J.
Xiaodan Zhang
Weerkamp W.
Larson M.
Lecture Notes in Computer Science English 2009 The VideoCLEF 2008 Vid2RSS task involves the assignment of thematic category labels to dual language (Dutch/English) television episode videos. The University of Amsterdam chose to focus on exploiting archival metadata and speech transcripts generated by both Dutch and English speech recognizers. A Support Vector Machine (SVM) classifier was trained on training data collected from Wikipedia. The results provide evidence that combining archival metadata with speech transcripts can improve classification performance, but that adding speech transcripts in an additional language does not yield performance gains. 0 0
Methopedia - Pedagogical design community for European educators Ryberg T.
Niemczik C.
Brenstein E.
8th European Conference on eLearning 2009, ECEL 2009 English 2009 The paper will discuss theoretical, methodological and technical aspects of the community based Methopedia wiki (www.methopedia.eu), which has been developed as a part of the EU-funded collaborative research project "Community of Integrated Blended Learning in Europe" (COMBLE; www.comble-project.eu). Methopedia is a wiki and social community aimed at facilitating knowledge transfer between trainers/educators from different institutions or countries through interactive peer-to-peer support, and sharing of learning practices. We describe how Methopedia has been developed though engaging practitioners in workshops with the aim of collecting known learning activities, designs and approaches, and how the models for sharing learning practices have been developed by drawing on practitioners' experiences, ideas and needs. We present and analyse the outcome of the workshops and discuss how practitioners have informed the practical design and theoretical issues regarding the design of Methopedia. The workshops have led to redesigns and also a number of important issues and problems have emerged. In the paper, we therefore present and discuss the socio-technical design of Methopedia, which is based on open source Wiki and Social Networking technologies. We describe the issues, functionalities and needs that have emerged from the workshops, such as metadata (taxonomy & tags), localised versions (multi-lingual) and the need for visual descriptions. Furthermore, we discuss the templates trainers/educators can use to describe and share their learning designs or learning activities, e.g. what categories would be helpful? How much metadata is relevant and how standardised or flexible the templates should be? We also discuss the theoretical considerations underlying the descriptive model of the templates by drawing on research within learning design and the educational pattern design approach. In particular we focus on exploring designs and descriptions of singular or sequences of learning activities. Furthermore, we discuss some of the tools and concepts under development as part of the work on Methopedia, such as a flash based tool to structure learning processes, a pictorial language for visualising learning activities/designs and how we aim to connect to existing networks for educators/trainers and initiatives similar to Methopedia. 0 0
NNexus: An automatic linker for collaborative web-based corpora Gardner J.
Krowne A.
Xiong L.
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT'09 English 2009 Collaborative online encyclopedias or knowledge bases such as Wikipedia and PlanetMath are becoming increasingly popular. We demonstrate NNexus, a generalization of the automatic linking engine of PlanetMath.org and the first system that automates the process of linking disparate "encyclopedia" entries into a fully-connected conceptual network. The main challenges of this problem space include: 1) linking quality (correctly identifying which terms to link and which entry to link to with minimal effort on the part of users), 2) efficiency and scalability, and 3) generalization to multiple knowledge bases and web-based information environment. We present NNexus that utilizes subject classification and other metadata to address these challenges and demonstrate its effectiveness and efficiency through multiple real world corpora. Copyright 2009 ACM. 0 0
Overview of videoCLEF 2008: Automatic generation of topic-based feeds for dual language audio-visual content Larson M.
Newman E.
Jones G.J.F.
Lecture Notes in Computer Science English 2009 The VideoCLEF track, introduced in 2008, aims to develop and evaluate tasks related to analysis of and access to multilingual multimedia content. In its first year, VideoCLEF piloted the Vid2RSS task, whose main subtask was the classification of dual language video (Dutch-language television content featuring English-speaking experts and studio guests). The task offered two additional discretionary subtasks: feed translation and automatic keyframe extraction. Task participants were supplied with Dutch archival metadata, Dutch speech transcripts, English speech transcripts and ten thematic category labels, which they were required to assign to the test set videos. The videos were grouped by class label into topic-based RSS-feeds, displaying title, description and keyframe for each video. Five groups participated in the 2008 VideoCLEF track. Participants were required to collect their own training data; both Wikipedia and general web content were used. Groups deployed various classifiers (SVM, Naive Bayes and k-NN) or treated the problem as an information retrieval task. Both the Dutch speech transcripts and the archival metadata performed well as sources of indexing features, but no group succeeded in exploiting combinations of feature sources to significantly enhance performance. A small scale fluency/adequacy evaluation of the translation task output revealed the translation to be of sufficient quality to make it valuable to a non-Dutch speaking English speaker. For keyframe extraction, the strategy chosen was to select the keyframe from the shot with the most representative speech transcript content. The automatically selected shots were shown, with a small user study, to be competitive with manually selected shots. Future years of VideoCLEF will aim to expand the corpus and the class label list, as well as to extend the track to additional tasks. 0 0
PRIMA: Archiving and querying historical data with evolving schemas Moon H.J.
Curinoy C.A.
Ham M.
Carlo Zaniolo
SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems English 2009 Schema evolution poses serious challenges in historical data management. Traditionally, historical data have been archived either by (i) migrating them into the current schema version that is well-understood by users but compromising archival quality, or (ii) by maintaining them under the original schema version in which the data was originally created, leading to perfect archival quality, but forcing users to formulate queries against complex histories of evolving schemas. In the PRIMA system, we achieve the best of both approaches, by (i) archiving historical data under the schema version under which they were originally created, and (ii) letting users express temporal queries using the current schema version. Thus, in PRIMA, the system rewrites the queries to the (potentially many) pertinent versions of the evolving schema. Moreover, the system offers automatic documentation of the schema history, and allows the users to pose temporal queries over the metadata history itself. The proposed demonstration highlights the system features exploiting both a synthetic-educational running example and the real-life evolution histories (schemas and data), which include hundreds of schema versions from Wikipedia and Ensembl. The demonstration offers a thorough walk-through of the system features and a hands-on system testing phase, where the audiences are invited to directly interact with the advanced query interface of PRIMA. 0 0
RadSem: Semantic annotation and retrieval for medical images Moller M.
Regel S.
Sintek M.
Lecture Notes in Computer Science English 2009 We present a tool for semantic medical image annotation and retrieval. It leverages the MEDICO ontology which covers formal background information from various biomedical ontologies such as the Foundational Model of Anatomy (FMA), terminologies like ICD-10 and RadLex and covers various aspects of clinical procedures. This ontology is used during several steps of annotation and retrieval: (1) We developed an ontology-driven metadata extractor for the medical image format DICOM. Its output contains, e. g., person name, age, image acquisition parameters, body region, etc. (2) The output from (1) is used to simplify the manual annotation by providing intuitive visualizations and to provide a preselected subset of annotation concepts. Furthermore, the extracted metadata is linked together with anatomical annotations and clinical findings to generate a unified view of a patient's medical history. (3) On the search side we perform query expansion based on the structure of the medical ontologies. (4) Our ontology for clinical data management allows us to link and combine patients, medical images and annotations together in a comprehensive result list. (5) The medical annotations are further extended by links to external sources like Wikipedia to provide additional information. 0 0
Survey of wikis as a design support tool Walthall C.
Sauter C.
Deigendesch T.
Devanathan S.
Albers A.
Ramani K.
DS 58-6: Proceedings of ICED 09, the 17th International Conference on Engineering Design English 2009 The use of design notebooks has long been common practice for engineers and designers. Wikis, freely editable collections of web sites, are becoming increasingly popular as flexible documentation and communication tools for collaborative design tasks. The main goals of this work are to better understand & improve wiki support for early design collaboration, and to give students hands-on experience in using wikis as a design tool. For this study, a wiki was provided for 500 engineering students (5 students per team) who worked to solve a challenging design problem. Surveys and interactive feedback sessions were used to analyze the wiki use upon completion of the design project. The results confirm that wikis are a useful and easy to use tool, but certain improvements would increase the utility of wikis for design projects. More features such as easier integration of graphics, metadata, and management options would improve the usefulness of wikis in design thereby improving shared understanding, allowing faster design iterations and better collaboration. 0 0
The effect of using a semantic wiki for metadata management: A controlled experiment Huner K.M.
Boris Otto
Proceedings of the 42nd Annual Hawaii International Conference on System Sciences, HICSS English 2009 A coherent and consistent understanding of corporate data is an important factor for effective management of diversified companies and implies a need for company-wide unambiguous data definitions. Inspired by the success of Wikipedia, wiki software has become a broadly discussed alternative for corporate metadata management. However, in contrast to the performance and sustainability of wikis in general, benefits of using semantic wikis have not been investigated sufficiently. The paper at hand presents results of a controlled experiment that investigates effects of using a semantic wiki for metadata management in comparison to a classical wiki. Considering threats to validity, the analysis (i.e. 74 subjects using both a classical and a semantic wiki) shows that the semantic wiki is superior to the classical variant regarding information retrieval tasks. At the same time, the results indicate that more effort is needed to build up the semantically annotated wiki content in the semantic wiki. 0 0
VideoCLEF 2008: ASR classification with wikipedia categories Kusrsten J.
Richter D.
Eibl M.
Lecture Notes in Computer Science English 2009 This article describes our participation at the VideoCLEF track. We designed and implemented a prototype for the classification of the Video ASR data. Our approach was to regard the task as text classification problem. We used terms from Wikipedia categories as training data for our text classifiers. For the text classification the Naive-Bayes and kNN classifier from the WEKA toolkit were used. We submitted experiments for classification task 1 and 2. For the translation of the feeds to English (translation task) Google's AJAX language API was used. Although our experiments achieved only low precision of 10 to 15 percent, we assume those results will be useful in a combined setting with the retrieval approach that was widely used. Interestingly, we could not improve the quality of the classification by using the provided metadata. 0 0
Employing a domain specific ontology to perform semantic search Morneau M.
Mineau G.W.
Lecture Notes in Computer Science English 2008 Increasing the relevancy of Web search results has been a major concern in research over the last years. Boolean search, metadata, natural language based processing and various other techniques have been applied to improve the quality of search results sent to a user. Ontology-based methods were proposed to refine the information extraction process but they have not yet achieved wide adoption by search engines. This is mainly due to the fact that the ontology building process is time consuming. An all inclusive ontology for the entire World Wide Web might be difficult if not impossible to construct, but a specific domain ontology can be automatically built using statistical and machine learning techniques, as done with our tool: SeseiOnto. In this paper, we describe how we adapted the SeseiOnto software to perform Web search on the Wikipedia page on climate change. SeseiOnto, by using conceptual graphs to represent natural language and an ontology to extract links between concepts, manages to properly answer natural language queries about climate change. Our tests show that SeseiOnto has the potential to be used in domain specific Web search as well as in corporate intranets. 0 0
Experiment management system-A way towards a transparent Tokamak Kramer-Flecken A.
Landgraf B.
Krom J.G.
Fusion Engineering and Design English 2008 At TEXTOR extensive collaborations with foreign institutes have been established. For a successful collaboration, the participating scientists must be able to plan the experiment in advance. Therefore they need tools to submit and track their experiment proposals and pulse plans. During and after the experiment access to raw and evaluated data of a TEXTOR discharge is needed to analyze TEXTOR data remotely from the home labs. A goal-oriented data analysis needs in addition a various items of data not recorded with conventional data logging. This discharge specific comments and settings are stored in a logbook. At TEXTOR a logbook with different views is implemented for the scientific experiment and the machine operation, respectively. In a further development towards a TEXTOR experiment management system the information of auxiliary heating devices is stored. The electronic logbook is realized by a client/server approach which makes the access independent from the hard- and software on the client side and allows to access the data and data supplements (metadata) by any common web browsers. All information from TEXTOR is accessible for everybody via a uniform resource locator (URL). The concept of the electronic logbook, the first experience and its possibilities for data analysis will be discussed in this paper. © 2007 Elsevier B.V. All rights reserved. 0 0
Instanced-based mapping between thesauri and folksonomies Wartena C.
Brussee R.
Lecture Notes in Computer Science English 2008 The emergence of web based systems in which users can annotate items, raises the question of the semantic interoperability between vocabularies originating from collaborative annotation processes, often called folksonomies, and keywords assigned in a more traditional way. If collections are annotated according to two systems, e.g. with tags and keywords, the annotated data can be used for instance based mapping between the vocabularies. The basis for this kind of matching is an appropriate similarity measure between concepts, based on their distribution as annotations. In this paper we propose a new similarity measure that can take advantage of some special properties of user generated metadata. We have evaluated this measure with a set of articles from Wikipedia which are both classified according to the topic structure of Wikipedia and annotated by users of the bookmarking service del.icio.us. The results using the new measure are significantly better than those obtained using standard similarity measures proposed for this task in the literature, i.e., it correlates better with human judgments. We argue that the measure also has benefits for instance based mapping of more traditionally developed vocabularies. 0 0
Learning to tag and tagging to learn: A case study on wikipedia Peter Mika
Massimiliano Ciaramita
Hugo Zaragoza
Jordi Atserias
IEEE Intelligent Systems English 2008 Information technology experts suggest that natural language technologies will play an important role in the Web's future. The latest Web developments, such as the huge success of Web 2.0, demonstrate annotated data's significant potential. The problem of semantically annotating Wikipedia inspires a novel method for dealing with domain and task adaptation of semantic taggers in cases where parallel text and metadata are available. One main approach to tagging for acquiring knowledge from Wikipedia involves self-training that adds automatically annotated data from the target domain to the original training data. Another key approach involves structural correspondence learning, which tries to build a shared feature representation of the data. 0 0
Managing the history of metadata in support for DB archiving and schema evolution Curino C.A.
Moon H.J.
Carlo Zaniolo
Lecture Notes in Computer Science English 2008 Modern information systems, and web information systems in particular, are faced with frequent database schema changes, which generate the necessity to manage them and preserve the schema evolution history. In this paper, we describe the Panta Rhei Framework designed to provide powerful tools that: (i) facilitate schema evolution and guide the Database Administrator in planning and evaluating changes, (ii) support automatic rewriting of legacy queries against the current schema version, (iii) enable efficient archiving of the histories of data and metadata, and (iv) support complex temporal queries over such histories. We then introduce the Historical Metadata Manager (HMM), a tool designed to facilitate the process of documenting and querying the schema evolution itself. We use the schema history of the Wikipedia database as a telling example of the many uses and benefits of HMM. 0 0
Qualitative geocoding of persistent web pages Angel A.
Lontou C.
Pfoser D.
Efentakis A.
GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems English 2008 Information and specifically Web pages may be organized, indexed, searched, and navigated using various metadata aspects, such as keywords, categories (themes), and also space. While categories and keywords are up for interpretation, space represents an unambiguous aspect to structure information. The basic problem of providing spatial references to content is solved by geocoding; a task that relates identifiers in texts to geographic co-ordinates. This work presents a methodology for the semiautomatic geocoding of persistent Web pages in the form of collaborative human intervention to improve on automatic geocoding results. While focusing on the Greek language and related Web pages, the developed techniques are universally applicable. The specific contributions of this work are (i) automatic geocoding algorithms for phone numbers, addresses and place name identifiers and (ii) a Web browser extension providing a map-based interface for manual geocoding and updating the automatically generated results. With the geocoding of a Web page being stored as respective annotations in a central repository, this overall mechanism is especially suited for persistent Web pages such as Wikipedia. To illustrate the applicability and usefulness of the overall approach, specific geocoding examples of Greek Web pages are presented. 0 0
Remote sensing ontology development for data interoperability Nagai M.
Ono M.
Shibasaki R.
29th Asian Conference on Remote Sensing 2008, ACRS 2008 English 2008 Remote sensing ontology is developed for not only integrating earth observation data, but also knowledge sharing and information transfer. Ontological information is used for data sharing service such as support of metadata deign, structuring of data contents, support of text mining. Remote sensing ontology is constructed based on Semantic MediaWiki. Ontological information are added to the dictionary by digitalizing text based dictionaries, developing "knowledge writing tool" for experts, and extracting semantic relations from authoritative documents by applying natural language processing technique. The ontology system containing the dictionary is developed as lexicographic ontology. Also, constructed ontological information is used for the reverse dictionary. 0 0
Semantic keyword-based retrieval of photos taken with mobile devices Viana W.
Hammiche S.
Moisuc B.
Villanova-Oliver M.
Gensel J.
Martin H.
MoMM2008 - The 6th International Conference on Advances in Mobile Computing and Multimedia English 2008 This paper presents an approach for incorporating contextual metadata in a keyword-based photo retrieval process. We use our mobile annotation system PhotoMap in order to create metadata describing the photo shoot context (e.g., street address, nearby objects, season, lighting, nearby people). These metadata are then used to generate a set of stamped words for indexing each photo. We adapt the Vector Space Model (VSM) in order to transform these shoot context words into document-vector terms. Furthermore, spatial reasoning is used for inferring new potential indexing terms. We define methods for weighting those terms and for handling a query matching. We also detail retrieval experiments carried out by using PhotoMap and Flickr geotagged photos. We illustrate the advantages of using Wikipedia georeferenced objects for indexing photos. 0 0
Taking up the mop: Identifying future wikipedia administrators Moira Burke
Kraut R.
Conference on Human Factors in Computing Systems - Proceedings English 2008 As Wikipedia grows, so do the messy byproducts of collaboration. Backlogs of administrative work are increasing, suggesting the need for more users with privileged admin status. This paper presents a model of editors who have successfully passed the peer review process to become admins. The lightweight model is based on behavioral metadata and comments, and does not require any page text. It demonstrates that the Wikipedia community has shifted in the last two years to prioritizing policymaking and organization experience over simple article-level coordination, and mere edit count does not lead to adminship. The model can be applied as an AdminFinderBot to automatically search all editors' histories and pick out likely future admins, as a self-evaluation tool, or as a dashboard of relevant statistics for voters evaluating admin candidates. 0 1
Using attention and context information for annotations in a semantic wiki Malte Kiesel
Sven Schwarz
Van Elst L.
Georg Buscher
CEUR Workshop Proceedings English 2008 For document-centric work, meta-information in form of annotations has proven useful to enhance search and other retrieval tasks. Since creating annotations manually is a lot of work, it is desirable to also tap less obtrusive sources of meta-information such as the user's context (projects the user is working on, currently relevant topics, etc.) and attention information (what text passages did the user read?). The Mymory project uses the semantic wiki Kaukolu that allows storing attention and context information in addition to standard semantic wiki metadata. Attention annotations are generated automatically using an eyetracker. All types of annotations get enriched with contextual information gathered by a context elicitation component. In this paper, an overview of the Mymory system is presented. 0 0
Using community-generated contents as a substitute corpus for metadata generation Meyer M.
Rensing C.
Steinmetz R.
International Journal of Advanced Media and Communication English 2008 Metadata is crucial for reuse of Learning Resources. However, in the area of e-Learning, suitable training corpora for automatic classification methods are hardly available. This paper proposes the use of community-generated substitute corpora for classification methods. As an example for such a substitute corpus, the free online Encyclopaedia Wikipedia is used as a training corpus for domain-independent classification and keyword extraction of Learning Resources. 0 0
Using semantic Wikis to support software reuse Shiva S.G.
Shala L.A.
Journal of Software English 2008 It has been almost four decades since the idea of software reuse was proposed. Many success stories have been told, yet it is believed that software reuse is still in the development phase and has not reached its fall potential. How far are we with software reuse research? What have we learned from previous software reuse efforts? This paper is an attempt to answer these questions and propose a software reuse repository system based on semantic Wikis. In addition to supporting general collaboration among users offered by regular wilds, semantic Wikis provide means of adding metadata about the concepts and relations that are contained within the Wiki. This underlying model of domain knowledge enhances the software repository navigation and search performance and result in a system that is easy to use for non-expert users while being powerful in the way in which new artifacts can be created and located. 0 0
Wikiful thinking Doyle B. EContent English 2008 The advantages and weakness of using wiki as a content and knowledge management tool are discussed. Wiki is economical as some tools are open source and free, and it collects knowledge, explicit and tacit very quickly. Wikipedia, one of the 10 busiest sites on the web, has been a great success with about 5 million registered editors and about 8 million articles in different languages. Wiki does not operate through the standards-based technology and content management best practices such as content reuse, modularity, structured writing, and information typing resulting in a lack of interoperability, poor metadata management, and little reusability within the wiki. The methods of wiki navigation includes the built-in and web-based search engine. Standardization of wiki includes the use of XHTML and a WYSIWYG editor interface for unsophisticated content contributors and having hidden structure to facilitate information retrieval. 0 1
Categorizing Learning Objects Based On Wikipedia as Substitute Corpus Marek Meyer
Christoph Rensing
Ralf Steinmetz
First International Workshop on Learning Object Discovery & Exchange (LODE'07), September 18, 2007, Crete, Greece 2007 As metadata is often not sufficiently provided by authors of Learning Resources, automatic metadata generation methods are used to create metadata afterwards. One kind of metadata is categorization, particularly the partition of Learning Resources into distinct subject cat- egories. A disadvantage of state-of-the-art categorization methods is that they require corpora of sample Learning Resources. Unfortunately, large corpora of well-labeled Learning Resources are rare. This paper presents a new approach for the task of subject categorization of Learning Re- sources. Instead of using typical Learning Resources, the free encyclope- dia Wikipedia is applied as training corpus. The approach presented in this paper is to apply the k-Nearest-Neighbors method for comparing a Learning Resource to Wikipedia articles. Different parameters have been evaluated regarding their impact on the categorization performance. 0 1
Categorizing learning objects based on wikipedia as substitute corpus Meyer M.
Rensing C.
Steinmetz R.
CEUR Workshop Proceedings English 2007 As metadata is often not sufficiently provided by authors of Learning Resources, automatic metadata generation methods are used to create metadata afterwards. One kind of metadata is categorization, particularly the partition of Learning Resources into distinct subject cat- egories. A disadvantage of state-of-the-art categorization methods is that they require corpora of sample Learning Resources. Unfortunately, large corpora of well-labeled Learning Resources are rare. This paper presents a new approach for the task of subject categorization of Learning Re- sources. Instead of using typical Learning Resources, the free encyclopedia Wikipedia is applied as training corpus. The approach presented in this paper is to apply the k-Nearest-Neighbors method for comparing a Learning Resource to Wikipedia articles. Different parameters have been evaluated regarding their impact on the categorization performance. 0 1
Chapter 7 Achieving a Holistic Web in the Chemistry Curriculum Rzepa H.S. Annual Reports in Computational Chemistry English 2007 [No abstract available] 0 0
Collaborative classification of growing collections with evolving facets Wu H.
Zubair M.
Maly K.
ACM Conference on Hypertext and Hypermedia English 2007 There is a lack of tools for exploring large non-textual collections. One challenge is the manual effort required to add metadata to these collections. In this paper, we propose an architecture that enables users to collaboratively build a faceted classification for a large, growing collection. Besides a novel wiki-like classification interface, the proposed architecture includes automated document classification and facet schema enrichment techniques. We have implemented a prototype for the American Political History multimedia collection from usa.gov. Copyright 2007 ACM. 0 0
Community tools for repurposing learning objects Chao Wang
Dickens K.
Davis H.C.
Gary Wills
Lecture Notes in Computer Science English 2007 A critical success factor for the reuse of learning objects is the ease by which they may be repurposed in order to enable reusability in a different teaching context from which they were originally designed. The current generation of tools for creating, storing, describing and locating learning objects are best suited for users with technical expertise. Such tools are an obstacle to teachers who might wish to perform alterations to learning objects in order to make them suitable for their context. In this paper we describe a simple set of tools to enable practitioners to adapt the content of existing learning objects and to store and modify metadata describing the intended teaching context of these learning objects. We are deploying and evaluating these tools within the UK language teaching community. 0 0
Medical Librarian 2.0 Connor E. Medical Reference Services Quarterly English 2007 Web 2.0 refers to an emerging social environment that uses various tools to create, aggregate, and share dynamic content in ways that are more creative and interactive than transactions previously conducted on the Internet. The extension of this social environment to libraries, sometimes called Library 2.0, has profound implications for how librarians will work, collaborate, and deliver content. Medical librarians can connect with present and future generations of users by learning more about the social dynamics of Web 2.0's vast ecosystem, and incorporating some of its interactive tools and technologies (tagging, peer production, and syndication) into routine library practice. © 2007 by The Haworth Press, Inc. All rights reserved. 0 0
Building a design engineering digital library: The workflow issues Grierson H.
Wodehouse A.
Breslin C.
Ion W.
Juster N.
DS 38: Proceedings of E and DPE 2006, the 8th International Conference on Engineering and Product Design Education English 2006 Over the past 2 years the Design Manufacturing and Engineering Management Department at the University of Strathclyde has been developing a digital library to support student design learning in global team-based design engineering projects through the DIDET project [1]. Previous studies in the classroom have identified the need for the development of two parallel systems - a shared workspace, the LauLima Learning Environment (LLE) and a digital library, the LauLima Digital Library (LDL) [2]. These two elements are encapsulated within LauLima, developed from the open-sourced groupware Tikiwiki. This paper will look at the workflow in relation to populating the digital library, discuss the issues as they are experienced by staff and students, e.g. the application of metadata (keywords and descriptions); harvesting of resources; reuse in classes; granularity; intellectual property rights and digital rights management (IPR and DRM), and make suggestions for improvement. 0 0
Development of a wiki-based, expert community-driven nanosystem vocabulary Laura M. Bartolo
Cathy S. Lowe
Sharon C. Glotzer
Christopher Iacovella
DCMI English 2006 0 0
Harvesting Wiki Consensus - Using Wikipedia Entries as Ontology Elements Martin Hepp
Daniel Bachlechner
Katharina Siorpaes
CEUR Workshop Proceedings English 2006 One major obstacle towards adding machine-readable annotation to existing Web content is the lack of domain ontologies. While FOAF and Dublin Core are popular means for expressing relationships between Web resources and between Web resources and literal values, we widely lack unique identifiers for common concepts and instances. Also, most available ontologies have a very weak community grounding in the sense that they are designed by single individuals or small groups of individuals, while the majority of potential users is not involved in the process of proposing new ontology elements or achieving consensus. This is in sharp contrast to natural language where the evolution of the vocabulary is under the control of the user community. At the same time, we can observe that, within Wiki communities, especially Wikipedia, a large number of users is able to create comprehensive domain representations in the sense of unique, machine-feasible, identifiers and concept definitions which are sufficient for humans to grasp the intension of the concepts. The English version of Wikipedia contains now more than one million entries and thus the same amount of URIs plus a human-readable description. While this collection is on the lower end of ontology expressiveness, it is likely the largest living ontology that is available today. In this paper, we (1) show that standard Wiki technology can be easily used as an ontology development environment for named classes, reducing entry barriers for the participation of users in the creation and maintenance of lightweight ontologies, (2) prove that the URIs of Wikipedia entries are surprisingly reliable identifiers for ontology concepts, and (3) demonstrate the applicability of our approach in a use case. 0 0
Integration of Wikipedia and a geography digital library Lim E.-P.
Zhe Wang
Sadeli D.
Yanyan Li
Chang C.-H.
Kalyani Chatterjea
Goh D.H.-L.
Theng Y.-L.
Jinghua Zhang
Aixin Sun
Lecture Notes in Computer Science English 2006 In this paper, we address the problem of integrating Wikipedia, an online encyclopedia, and G-Portal, a web-based digital library, in the geography domain. The integration facilitates the sharing of data and services between the two web applications that are of great value in learning. We first present an overall system architecture for supporting such an integration and address the metadata extraction problem associated with it. In metadata extraction, we focus on extracting and constructing metadata for geo-political regions namely cities and countries. Some empirical performance results will be presented. The paper will also describe the adaptations of G-Portal and Wikipedia to meet the integration requirements. 0 0
On integrating a semantic wiki in a knowledge management system De Paoli F.
Loregian M.
CEUR Workshop Proceedings English 2006 The use of knowledge management systems is often hampered by the heavy overload for publishing information. In particular, uploading a document and then profiling it with a set of meta-data and keywords is a tedious and time-consuming activity. Therefore, one of the main goals for such systems should be to make publishing of explicit knowledge as natural as possible. In the project described in this paper, we exploit a semantic wiki editor to support document publishing by means of textual descriptions augmented by ontology-defined annotations. Such annotations are then managed as entries in metadata profiles. Moreover, we can publish semantic-wiki-based documents that do not require any further activity to be profiled and included in a knowledge base as they are self-describing. The semantic wiki project is part of a collaborative knowledge management system that has been developed to support project teams and communities of interest. 0 0
Semantic Wiki as a lightweight knowledge management system Hendry Muljadi
Hideaki Takeda
Aman Shakya
Shoko Kawamoto
Satoshi Kobayashi
Asao Fujiyama
Koichi Ando
Lecture Notes in Computer Science English 2006 Since its birth in 1995, Wild has become more and more popular. This paper presents a Semantic Wiki, a Wiki extended to include the ideas of Semantic Web. The proposed Semantic Wiki uses a simple Wiki syntax to write labeled links which represent RDF triples. By enabling the writing of labeled links, Semantic Wiki may provide an easy-to-use and flexible environment for an integrated management of content and metadata, so that Semantic Wiki may be used as a lightweight knowledge management system. 0 0
Semantic wiki as a lightweight knowledge management system Hendry Muljadi
Hideaki Takeda
Aman Shakya
Shoko Kawamoto
Satoshi Kobayashi
Asao Fujiyama
Koichi Ando
ASWC English 2006 0 0
SweetWiki : Semantic WEb enabled technologies in wiki Michel Buffa
Crova G.
Fabien Gandon
Lecompte C.
Passeron J.
CEUR Workshop Proceedings English 2006 Wikis are social web sites enabling a potentially large number of participants to modify any page or create a new page using their web browser. As they grow, wikis suffer from a number of problems (anarchical structure, large number of pages, aging navigation paths, etc.). We believe that semantic wikis can improve navigation and search. In SweetWiki we investigate the use of semantic web technologies to support and ease the lifecycle of the wiki. The very model of wikis was declaratively described: an OWL schema captures concepts such as WikiWord, wiki page, forward and backward link, author, etc. This ontology is then exploited by an embedded semantic search engine (Corese). In addition, SweetWiki integrates a standard WYSIWYG editor (Kupu) that we extended to support semantic annotation following the "social tagging" approach made popular by web sites such as flickr.com. When editing a page, the user can freely enter some keywords in an AJAX-powered textfield and an auto-completion mechanism proposes existing keywords by issuing SPARQL queries to identify existing concepts with compatible labels. Thus tagging is both easy (keyword-like) and motivating (real time display of the number of related pages) and concepts are collected as in folksonomies. To maintain and reengineer the folksonomy, we reused a web-based editor available in the underlying semantic web server to edit semantic web ontologies and annotations. Unlike in other wikis, pages are stored directly in XHTML ready to be served and semantic annotations are embedded in the pages themselves using RDF/A. If someone sends or copy a page, the annotations follow it, and if an application crawls the wiki site it can extract the metadata and reuse them. 0 0
SweetWiki: Semantic Web enabled technologies in Wiki Michel Buffa
Fabien Gandon
Proceedings of WikiSym'06 - 2006 International Symposium on Wikis English 2006 Wikis are social web sites enabling a potentially large number of participants to modify any page or create a new page using their web browser. As they grow, wikis may suffer from a number of problems (anarchical structure, aging navigation paths, etc.). We believe that semantic wikis can improve navigation and search. In SweetWiki we investigate the use of semantic web technologies to support and ease the lifecycle of the wiki. The very model of wikis was declaratively described: an OWL schema captures concepts such as wiki word, wiki page, forward and backward link, author, etc. This ontology is then exploited by an embedded semantic search engine (Corese). In addition, SweetWiki integrates a standard WYSIWYG editor (Kupu) that we extended to support semantic annotation following the "social tagging": when editing a page, the user can freely enter some keywords and an auto-completion mechanism proposes existing keywords by issuing queries to identify existing concepts with compatible labels. Thus tagging is both easy (keyword-like) and motivating (real time display of the number of related pages) and concepts are collected as in folksonomies. To maintain and reengineer the folksonomy, we reused a web-based editor available in the underlying semantic web server to edit semantic web ontologies and annotations. Unlike in other wikis, pages are stored directly in XHTML ready to be served and semantic annotations are embedded in the pages themselves using RDFa. If someone sends or copies a page, the annotations follow it, and if an application crawls the wiki site it can extract the metadata and reuse them. In this paper we motivate our approach and explain each one of these design choices. 0 0
SweetWiki: Semantic web enabled technologies in wiki Michel Buffa
Fabien Gandon
Proceedings of WikiSym'06 - 2006 International Symposium on Wikis English 2006 Wikis are social web sites enabling a potentially large number of participants to modify any page or create a new page using their web browser. As they grow, wikis may suffer from a number of problems (anarchical structure, aging navigation paths, etc.). We believe that semantic wikis can improve navigation and search. In SweetWiki we investigate the use of semantic web technologies to support and ease the lifecycle of the wiki. The very model of wikis was declaratively described: an OWL schema captures concepts such as wiki word, wiki page, forward and backward link, author, etc. This ontology is then exploited by an embedded semantic search engine (Corese). In addition, SweetWiki integrates a standard WYSIWYG editor (Kupu) that we extended to support semantic annotation following the "social tagging": when editing a page, the user can freely enter some keywords and an auto-completion mechanism proposes existing keywords by issuing queries to identify existing concepts with compatible labels. Thus tagging is both easy (keyword-like) and motivating (real time display of the number of related pages) and concepts are collected as in folksonomies. To maintain and reengineer the folksonomy, we reused a web-based editor available in the underlying semantic web server to edit semantic web ontologies and annotations. Unlike in other wikis, pages are stored directly in XHTML ready to be served and semantic annotations are embedded in the pages themselves using RDFa. If someone sends or copies a page, the annotations follow it, and if an application crawls the wiki site it can extract the metadata and reuse them. In this paper we motivate our approach and explain each one of these design choices. Copyright 2006 ACM. 0 0
Towards a wiki interchange format (WIF) opening semantic wiki content and metadata Volkel M.
Eyal Oren
CEUR Workshop Proceedings English 2006 Wikis tend to be used more and more in world-wide, intranet and increasingly even in personal settings. Current wikis are data islands. They are open for everyone to contribute, but closed for machines and automation. In this paper we define a wiki interchange format (WIF) that allows data exchange between wikis and related tools. Different from other approaches, we also tackle the problem of page content and annotations. The linking from formal annotations to parts of a structured text is analysed and described. 0 0
Ylvi - Multimedia-izing the semantic wiki Niko Popitsch
Schandl B.
Amiri A.
Leitich S.
Jochum W.
CEUR Workshop Proceedings English 2006 Semantic and semi-structured wiki implementations, which extend traditional, purely string-based wikis by adding machine-processable metadata, suffer from a lack of support for media management. Currently, it is difficult to maintain semantically rich metadata for both wiki pages and associated media assets; media management functionalities are cumbersome or missing. With Ylvi, a semantic wiki based on the METIS multimedia framework, we combine the advantages of structured, type-/attribute-based media management and the open, relatively unstructured wiki approach. By representing wiki pages as METIS objects, we can apply sophisticated media management features to the wiki domain and provide an extensible, multimedia-enabled semantic wiki. 0 0