Digital libraries

From WikiPapers
Jump to: navigation, search

Digital libraries is included as keyword or extra keyword in 1 datasets, 0 tools and 82 publications.


Dataset Size Language Description
Repos-2012-dataset 5 MB Spanish
repos-2012-dataset contains metadata about links to digital repositories from Spanish and Catalan Wikipedias.


There is no tools for this keyword.


Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Capturing scholar's knowledge from heterogeneous resources for profiling in recommender systems Amini B.
Ibrahim R.
Othman M.S.
Selamat A.
Expert Systems with Applications 2014 In scholars' recommender systems, acquisition knowledge for construction profiles is crucial because profiles provide fundamental information for accurate recommendation. Despite the availability of various knowledge resources, identification and collecting extensive knowledge in an unobtrusive manner is not straightforward. In order to capture scholars' knowledge, some questions must be answered: what knowledge resource is appropriate for profiling, how knowledge items can be unobtrusively captured, and how heterogeneity among different knowledge resources should be resolved. To address these issues, we first model the scholars' academic behavior and extract different knowledge items, diffused over the Web including mediated profiles in digital libraries, and then integrate those heterogeneous knowledge items by Wikipedia. Additionally, we analyze the correlation between knowledge items and partition the scholars' research areas for multi-disciplinary profiling. Compared to the state-of-the-art, the result of empirical evaluation shows the efficiency of our approach in terms of completeness and accuracy. © 2014 Elsevier Ltd. All rights reserved. 0 0
Exploiting Wikipedia for Evaluating Semantic Relatedness Mechanisms Ferrara F.
Tasso C.
Communications in Computer and Information Science English 2014 The semantic relatedness between two concepts is a measure that quantifies the extent to which two concepts are semantically related. In the area of digital libraries, several mechanisms based on semantic relatedness methods have been proposed. Visualization interfaces, information extraction mechanisms, and classification approaches are just some examples of mechanisms where semantic relatedness methods can play a significant role and were successfully integrated. Due to the growing interest of researchers in areas like Digital Libraries, Semantic Web, Information Retrieval, and NLP, various approaches have been proposed for automatically computing the semantic relatedness. However, despite the growing number of proposed approaches, there are still significant criticalities in evaluating the results returned by different methods. The limitations evaluation mechanisms prevent an effective evaluation and several works in the literature emphasize that the exploited approaches are rather inconsistent. In order to overcome this limitation, we propose a new evaluation methodology where people provide feedback about the semantic relatedness between concepts explicitly defined in digital encyclopedias. In this paper, we specifically exploit Wikipedia for generating a reliable dataset. 0 0
Identifying the topic of queries based on domain specify ontology ChienTa D.C.
Thi T.P.
WIT Transactions on Information and Communication Technologies English 2014 In order to identify the topic of queries, a large number of past researches have relied on lexicon-syntactic and handcrafted knowledge sources in Machine Learning and Natural Language Processing (NLP). Conversely, in this paper, we introduce the application system that detects the topic of queries based on domain-specific ontology. On this system, we work hard on building this domainspecific ontology, which is composed of instances automatically extracted from available resources such as Wikipedia, WordNet, and ACM Digital Library. The experimental evaluation with many cases of queries related to information technology area shows that this system considerably outperforms a matching and identifying approach. 0 0
Opportunities for using Wiki technologies in building digital library models Mammadov E.C.O. Library Hi Tech News English 2014 Purpose: The purpose of this article is to research the open access and encyclopedia structured methodology of building digital libraries. In Azerbaijan Libraries, one of the most challenged topics is organizing digital resources (books, audio-video materials, etc.). Wiki technologies introduce easy, collaborative and open tools opportunities which make it possible to implement in digital library buildings. Design/methodology/approach: This paper looks at current practices, and the ways of organizing information resources to make them more systematized, open and accessible. These activities are valuable for rural libraries which are smaller and less well funded than main and central libraries in cities. Findings: The main finding of this article is how to organize digital resource management in the libraries using Wiki ideology. Originality/value: Wiki technologies determine the ways of building digital library network models which are structurally different from already known models, as well as new directions in forming information society and solving the problems encountered. 0 0
Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools Lopuszynski M.
Bolikowski L.
Communications in Computer and Information Science English 2014 In this work, we compare two simple methods of tagging scientific publications with labels reflecting their content. As a first source of labels Wikipedia is employed, second label set is constructed from the noun phrases occurring in the analyzed corpus. We examine the statistical properties and the effectiveness of both approaches on the dataset consisting of abstracts from 0.7 million of scientific documents deposited in the ArXiv preprint collection. We believe that obtained tags can be later on applied as useful document features in various machine learning tasks (document similarity, clustering, topic modelling, etc.). 0 0
Towards automatic building of learning pathways Siehndel P.
Kawase R.
Nunes B.P.
Herder E.
WEBIST 2014 - Proceedings of the 10th International Conference on Web Information Systems and Technologies English 2014 Learning material usually has a logical structure, with a beginning and an end, and lectures or sections that build upon one another. However, in informal Web-based learning this may not be the case. In this paper, we present a method for automatically calculating a tentative order in which objects should be learned based on the estimated complexity of their contents. Thus, the proposed method is based on a process that enriches textual objects with links to Wikipedia articles, which are used to calculate a complexity score for each object. We evaluated our method with two different datasets: Wikipedia articles and online learning courses. For Wikipedia data we achieved correlations between the ground truth and the predicted order of up to 0.57 while for subtopics inside the online learning courses we achieved correlations of 0.793. 0 0
A comparative study of academic and wikipedia ranking Shuai X.
Jiang Z.
Xiaojiang Liu
Bollen J.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries English 2013 In addition to its broad popularity Wikipedia is also widely used for scholarly purposes. Many Wikipedia pages pertain to academic papers, scholars and topics providing a rich ecology for scholarly uses. Scholarly references and mentions on Wikipedia may thus shape the \societal impact" of a certain scholarly communication item, but it is not clear whether they shape actual \academic impact". In this paper we compare the impact of papers, scholars, and topics according to two different measures, namely scholarly citations and Wikipedia mentions. Our results show that academic and Wikipedia impact are positively correlated. Papers, authors, and topics that are mentioned on Wikipedia have higher academic impact than those are not mentioned. Our findings validate the hypothesis that Wikipedia can help assess the impact of scholarly publications and underpin relevance indicators for scholarly retrieval or recommendation systems. Copyright © 2013 by the Association for Computing Machinery, Inc. (ACM). 0 0
Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms Joorabchi A.
Mahdi A.E.
Journal of Information Science English 2013 Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems. This article describes a machine learning-based keyphrase annotation method for scientific documents that utilizes Wikipedia as a thesaurus for candidate selection from documents' content. We have devised a set of 20 statistical, positional and semantical features for candidate phrases to capture and reflect various properties of those candidates that have the highest keyphraseness probability. We first introduce a simple unsupervised method for ranking and filtering the most probable keyphrases, and then evolve it into a novel supervised method using genetic algorithms. We have evaluated the performance of both methods on a third-party dataset of research papers. Reported experimental results show that the performance of our proposed methods, measured in terms of consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised and unsupervised methods. 0 0
Disambiguation to Wikipedia: A language and domain independent approach Nguyen T.-V.T. Lecture Notes in Computer Science English 2013 Disambiguation to Wikipedia (D2W) is the task of linking mentions of concepts in text to their corresponding Wikipedia articles. Traditional approaches to D2W has focused either in only one language (e.g. English) or in formal texts (e.g. news articles). In this paper, we present a multilingual framework with a set of new features that can be obtained purely from the online encyclopedia, without the need of any natural language specific tool. We analyze these features with different languages and different domains. The approach shows as fully language-independent and has been applied successfully to English, Italian, Polish, with a consistent improvement. We show that only a sufficient number of Wikipedia articles is needed for training. When trained on real-world data sets for English, our new features yield substantial improvement compared to current local and global disambiguation algorithms. Finally, the adaption to the Bridgeman query logs in digital libraries shows the robustness of our approach even in the lack of disambiguation context. Also, as no natural language specific tool is needed, the method can be applied to other languages in a similar manner with little adaptation. 0 0
Escaping the trap of too precise topic queries Libbrecht P. Lecture Notes in Computer Science English 2013 At the very center of digital mathematics libraries lie controlled vocabularies which qualify the topic of the documents. These topics are used when submitting a document to a digital mathematics library and to perform searches in a library. The latter are refined by the use of these topics as they allow a precise classification of the mathematics area this document addresses. However, there is a major risk that users employ too precise topics to specify their queries: they may be employing a topic that is only "close-by" but missing to match the right resource. We call this the topic trap. Indeed, since 2009, this issue has appeared frequently on the platform. Other mathematics portals experience the same phenomenon. An approach to solve this issue is to introduce tolerance in the way queries are understood by the user. In particular, the approach of including fuzzy matches but this introduces noise which may prevent the user of understanding the function of the search engine. In this paper, we propose a way to escape the topic trap by employing the navigation between related topics and the count of search results for each topic. This supports the user in that search for close-by topics is a click away from a previous search. This approach was realized with the i2geo search engine and is described in detail where the relation of being related is computed by employing textual analysis of the definitions of the concepts fetched from the Wikipedia encyclopedia. 0 0
PATHSenrich: A web service prototype for automatic cultural heritage item enrichment Eneko Agirre
Barrena A.
Fernandez K.
Miranda E.
Otegi A.
Aitor Soroa
Lecture Notes in Computer Science English 2013 Large amounts of cultural heritage material are nowadays available through online digital library portals. Most of these cultural items have short descriptions and lack rich contextual information. The PATHS project has developed experimental enrichment services. As a proof of concept, this paper presents a web service prototype which allows independent content providers to enrich cultural heritage items with a subset of the full functionality: links to related items in the collection and links to related Wikipedia articles. In the future we plan to provide more advanced functionality, as available offline for PATHS. 0 0
WikiBilim (Wiki Knowledge): Salvation of the Kazakh language on the internet Sapargaliyev D. 2013 International Conference on Interactive Collaborative Learning, ICL 2013 English 2013 The Kazakh language was under the unofficial ban in the Soviet period. In recent years, Kazakh-language segment of the Internet is developing rapidly. The development of the Kazakh language has become the first priority of the government. The most successful and large-scale project on revival of the Kazakh language was the creation of WikiBilim fund. The main objective of the fund is to increase the articles in the Kazakh language on Wikipedia. Also, the fund is actively assisting in the creation of digital library, online translator and dictionary. Will WikiBilim be a salvation of the Kazakh language on the Internet? 0 0
A technique for suggesting related Wikipedia articles using link analysis Markson C.
Song M.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries English 2012 With more than 3.7 million articles, Wikipedia has become an important social medium for sharing knowledge. However, with this enormous repository of information, it can often be difficult to locate fundamental topics that support lower-level articles. By exploiting the information stored in the links between articles, we propose that related companion articles can be automatically generated to help further the reader's understanding of a given topic. This approach to a recommendation system uses tested link analysis techniques to present users with a clear path to related high-level articles, furthering the understanding of low-level topics. 0 0
Analysis on construction of information commons of Wiki-based Olympic Library Ma Q. Proceedings - 2012 International Conference on Computer Science and Information Processing, CSIP 2012 English 2012 As one of the WEB2.0 technologies, wiki technology emerged in the early 21st century after Beijing Olympics. This study explores how to put such technology into application by effectively using the Olympic legacy of Beijing Olympic Games - Olympic Library. It uses literature methods, and combines with practical work experiences. Firstly, it introduces the overview of the Library of Capital Institute of Physical Education in the Olympic Library Project, and the development of Olympic Library in post-Olympic period; then, it collates the basic concepts of information commons (IC) within the industry, including the concept of IC, composing elements and the IC construction profiles in national libraries; finally, based on the existing conditions and the Olympic libraries advantages and combined with the rapid development of the digital environment, it discusses the application of wiki technology, the principles and ideas to achieve the innovative development of Olympic Library through the construction of information commons. 0 0
Automatic subject metadata generation for scientific documents using wikipedia and genetic algorithms Joorabchi A.
Mahdi A.E.
Lecture Notes in Computer Science English 2012 Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents. However, scientific documents that are manually annotated with keyphrases are in the minority. This paper describes a machine learning-based automatic keyphrase annotation method for scientific documents, which utilizes Wikipedia as a thesaurus for candidate selection from documents' content and deploys genetic algorithms to learn a model for ranking and filtering the most probable keyphrases. Reported experimental results show that the performance of our method, evaluated in terms of inter-consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised methods. 0 0
Automatic vandalism detection in Wikipedia with active associative classification Maria Sumbana
Goncalves M.A.
Rodrigo Silva
Jussara Almeida
Adriano Veloso
Lecture Notes in Computer Science English 2012 Wikipedia and other free editing services for collaboratively generated content have quickly grown in popularity. However, the lack of editing control has made these services vulnerable to various types of malicious actions such as vandalism. State-of-the-art vandalism detection methods are based on supervised techniques, thus relying on the availability of large and representative training collections. Building such collections, often with the help of crowdsourcing, is very costly due to a natural skew towards very few vandalism examples in the available data as well as dynamic patterns. Aiming at reducing the cost of building such collections, we present a new active sampling technique coupled with an on-demand associative classification algorithm for Wikipedia vandalism detection. We show that our classifier enhanced with a simple undersampling technique for building the training set outperforms state-of-the-art classifiers such as SVMs and kNNs. Furthermore, by applying active sampling, we are able to reduce the need for training in almost 96% with only a small impact on detection results. 0 0
Catching the drift - Indexing implicit knowledge in chemical digital libraries Kohncke B.
Tonnies S.
Balke W.-T.
Lecture Notes in Computer Science English 2012 In the domain of chemistry the information gathering process is highly focused on chemical entities. But due to synonyms and different entity representations the indexing of chemical documents is a challenging process. Considering the field of drug design, the task is even more complex. Domain experts from this field are usually not interested in any chemical entity itself, but in representatives of some chemical class showing a specific reaction behavior. For describing such a reaction behavior of chemical entities the most interesting parts are their functional groups. The restriction of each chemical class is somehow also related to the entities' reaction behavior, but further based on the chemist's implicit knowledge. In this paper we present an approach dealing with this implicit knowledge by clustering chemical entities based on their functional groups. However, since such clusters are generally too unspecific, containing chemical entities from different chemical classes, we further divide them into sub-clusters using fingerprint based similarity measures. We analyze several uncorrelated fingerprint/similarity measure combinations and show that the most similar entities with respect to a query entity can be found in the respective sub-cluster. Furthermore, we use our approach for document retrieval introducing a new similarity measure based on Wikipedia categories. Our evaluation shows that the sub-clustering leads to suitable results enabling sophisticated document retrieval in chemical digital libraries. 0 0
Cognitive biases - Improving project team's work Kraus W.E. AACE International Transactions English 2012 Wow. I learned a little bit about cognitive biases and thought they must play a role in estimators not using the best judgment in their work. I performed a search in the AACE virtual library for papers containing the word bias and found I'm not the only one who thinks so, and also that biases affect much more in project controls than just estimates. In doing more reading, I found that it is a much bigger factor in our lives than I realized. That is truer in estimating than I want to think about. Wikipedia, the free online encyclopedia, in the article List of Cognitive Biases, lists a total of 113 variations on biases. How do these cognitive biases affect our estimates and how can we negate the effects? Understanding these points will hopefully allow estimators to produce estimates better meeting the needs of their projects. 0 0
Event-centric search and exploration in document collections Strotgen J.
Gertz M.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries English 2012 Textual data ranging from corpora of digitized historic documents to large collections of news feeds provide a rich source for temporal and geographic information. Such types of information have recently gained a lot of interest in support of different search and exploration tasks, e.g., by organizing news along a timeline or placing the origin of documents on a map. However, for this, temporal and geographic information embedded in documents is often considered in isolation. We claim that through combining such information into (chronologically ordered) event-like features interesting and meaningful search and exploration tasks are possible. In this paper, we present a framework for the extraction, exploration, and visualization of event information in document collections. For this, one has to identify and combine temporal and geographic expressions from documents, thus enriching a document collection by a set of normalized events. Traditional search queries then can be enriched by conditions on the events relevant to the search subject. Most important for our event-centric approach is that a search result consists of a sequence of events relevant to the search terms and not just a document hit-list. Such events can originate from different documents and can be further explored, in particular events relevant to a search query can be ordered chronologically. We demonstrate the utility of our framework by different (multilingual) search and exploration scenarios using a Wikipedia corpus. 0 0
Exploiting scholar's background knowledge to improve recommender system for digital libraries Amini B.
Ibrahim R.
Othman M.S.
International Journal of Digital Content Technology and its Applications English 2012 Recommender systems for digital libraries have received increasing attention since they assist scholars to find the most appropriate articles for research purposes. Many research studies have recently conducted to model the user interests in order to suggest scientific articles based on the scholar's preferences. However, a major problem of such systems is that they do not subsume user's background knowledge into the recommendation process and scholars typically have to sift manually irrelevant articles retrieved from digital libraries. Therefore, a challenging task is how to collect and exploit sufficient scholar's academic knowledge into the personalization process in order to improve the recommendation accuracy. To address this problem, a recommender framework that consolidates scholar's background knowledge based on the ontological modeling is proposed. The framework exploits Wikipedia as a lexicographic database for concept disambiguation and semantic concept mapping. The practical evaluation by a group of scholars over CiteSeerX digital library indicates an improvement in accuracy of recommendation task. 0 0
Investigation of the english version of the wikipedia categories graph Shkotin A. CEUR Workshop Proceedings 2012 Wikipedia is the outstanding project of knowledge accumulation. The knowledge is both of the general use, as well of various specialization domains. Quality check of this knowledge, especially automatic, is very important. In this paper results of studying of a structure of the English version of WCG (Wikipedia Categories Graph) as a whole are presented. WCG is a system that supports structure of knowledge and we are interested in WCG content and its arrangement. It is shown that in graph there are unacceptable logical violations; organizational and technical methods for their elimination are discussed. 0 0
Linking folksonomies to knowledge organization systems Jakob Voss Communications in Computer and Information Science English 2012 This paper demonstrates enrichment of set-model folksonomies with hierarchical links and mappings to other knowledge organization systems. The process is exemplified with social tagging practice in Wikipedia and in Stack Exchange. The extended folksonomies are created by crowdsourcing tag names and descriptions to translate them to linked data in SKOS. 0 0
Malleable finding aids Anderson S.R.
Allen R.B.
Lecture Notes in Computer Science English 2012 We show a prototype implementation of a Wiki-based Malleable Finding Aid that provides features to support user engagement and we discuss the contribution of individual features such as graphical representations, a table of contents, interactive sorting of entries, and the possibility for user tagging. Finally, we explore the implications of Malleable Finding Aids for collections which are richly inter-linked and which support a fully social Archival Commons. 0 0
Scientific cyberlearning resources referential metadata creation via information retrieval Xiaojiang Liu
Jia H.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries English 2012 The goal of this research is to describe an innovative method of creating scientific referential metadata for a cyberinfrastructure-enabled learning environment to enhance student and scholar learning experiences. By using information retrieval and meta-search approaches, different types of referential metadata, such as related Wikipedia Pages, Datasets, Source Code, Video Lectures, Presentation Slides, and (online) Tutorials, for an assortment of publications and scientific topics will be automatically retrieved, associated, and ranked. 0 0
Taxonomy-based query-dependent schemes for profile similarity measurement Tuarob S.
Mitra P.
Giles C.L.
ACM International Conference Proceeding Series English 2012 Semantic search techniques have increasingly gained attention in information retrieval literature. Authors are great sources of semantic interpretation for documents, especially in scholarly domains where articles mostly reflect the research interests of the authors. Being able to interpret semantic meanings of documents from their authors would give rise to many interesting applications, especially in academic digital library literature. In this paper, we present taxonomy-based query-dependent schemes for computing author profile similarity. Our schemes have the capability to capture partial similarities, as opposed to traditional topic overlap based approaches. We generalize our schemes so that they can be easily adopted to other application domains. We acquire resources from multiple places such as Wikipedia, CiteseerX, ArnetMiner, and WikipediaMiner as part of our work. We provide encouraging anecdotal results along with suggestions on potential applications of the proposed schemes. 0 0
A survey on web archiving initiatives Gomes D.
Miranda J.
Costa M.
Lecture Notes in Computer Science English 2011 Web archiving has been gaining interest and recognized importance for modern societies around the world. However, for web archivists it is frequently difficult to demonstrate this fact, for instance, to funders. This study provides an updated and global overview of web archiving. The obtained results showed that the number of web archiving initiatives significantly grew after 2003 and they are concentrated on developed countries. We statistically analyzed metrics, such as, the volume of archived data, archive file formats or number of people engaged. Web archives all together must process more data than any web search engine. Considering the complexity and large amounts of data involved in web archiving, the results showed that the assigned resources are scarce. A Wikipedia page was created to complement the presented work and be collaboratively kept up-to-date by the community. 3 0
Automatic assessment of document quality in web collaborative digital libraries Dalip D.H.
Goncalves M.A.
Marco Cristo
Pável Calado
Journal of Data and Information Quality English 2011 The old dream of a universal repository containing all of human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and open edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its quality. In this work, we explore a significant number of quality indicators and study their capability to assess the quality of articles from three Web collaborative digital libraries. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment. Through experiments, we show that the most important quality indicators are those which are also the easiest to extract, namely, the textual features related to the structure of the article. Moreover, to the best of our knowledge, this work is the first that shows an empirical comparison between Web collaborative digital libraries regarding the task of assessing article quality. 0 0
Digital libraries and social web: Insights from wikipedia users' activities Zelenkauskaite A.
Paolo Massa
Proceedings of the IADIS International Conferences - Web Based Communities and Social Media 2011, Social Media 2011, Internet Applications and Research 2011, Part of the IADIS, MCCSIS 2011 English 2011 A growing importance of the social aspects within large scale knowledge depositories as digital libraries was discerned since the last decade for its ever increasing number of digital depositories and users. Despite the fact that this digital trend influenced multiple users, yet little is known about how users navigate in these online platforms. In this study Wikipedia is considered as a lens to analyze user activities within a large scale online environment, in order to achieve a better understanding regarding user needs in online knowledge depositories. This study analyzed user activities in real setting where editing activities of 686,332 active contributors of English Wikipedia have been studied within a period of ten years. Their editing behaviors were compared based on different periods of permanence (longevity) within Wikipedia's content-oriented versus social-oriented namespaces. The results show that users with less than 21 days of longevity were more likely to interact in namespaces that were designated for social purposes, compared to the users who remained from two to ten years who were more likely to exploit functionalities related to content discussion. The implications of these findings were positioned within the collaborative learning framework which postulates that users with different expertise levels have different exigencies. Since social functionalities were more frequently used by users who stayed for short periods of time, inclusion of such functionalities in online platforms can provide support to this segment of users. This study aims at contributing to the design of online collaborative environments such as digital libraries where socialoriented design would allow creating more sustainable environments that are built around the specific needs of diverse users. 0 0
GreenWiki: A tool to support users' assessment of the quality of Wikipedia articles Dalip D.H.
Santos R.L.
Oliveira D.R.
Amaral V.F.
Goncalves M.A.
Prates R.O.
Minardi R.C.M.
De Almeida J.M.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries English 2011 In this work, we present GreenWiki, which is a wiki with a panel of quality indicators to assist the reader of a Wikipedia article in assessing its quality. 0 0
How to teach digital library data to swim into research Schindler C.
Cornelia Veja
Marc Rittberger
Vrandecic D.
ACM International Conference Proceeding Series English 2011 Virtual research environments (VREs) aim to enhance research practice and have been identified as drivers for changes in libraries. This paper argues that VREs in combination with Semantic Web technologies offer a range of possibilities to align research with library practices. This main claim of the article is exemplified by a metadata integration process of bibliographic data from libraries to a VRE which is based on Semantic MediaWiki. The integration process rests on three pillars: MediaWiki as a web-based repository, Semantic MediaWiki annotation mechanisms, and semi-automatic workflow management for the integration of digital resources. Thereby, needs of scholarly research practices and capacities for interactions are taken into account. The integration process is part of the design of Semantic MediaWiki for Collaborative Corpora Analysis (SMW-CorA) which uses a concrete research project in the history of education as a reference point for an infrastructural distribution. Semantic MediaWiki thus provides a light-weight environment offering a framework for re-using heterogeneous resources and a flexible collaborative way of conducting research. 0 0
Hybrid and interactive domain-specific translation for multilingual access to digital libraries Jones G.J.F.
Fuller M.
Newman E.
YanChun Zhang
Lecture Notes in Computer Science English 2011 Accurate high-coverage translation is a vital component of reliable cross language information retrieval (CLIR) systems. This is particularly true for retrieval from archives such as Digital Libraries which are often specific to certain domains. While general machine translation (MT) has been shown to be effective for CLIR tasks in laboratory information retrieval evaluation tasks, it is generally not well suited to specialized situations where domain-specific translations are required. We demonstrate that effective query translation in the domain of cultural heritage (CH) can be achieved using a hybrid translation method which augments a standard MT system with domain-specific phrase dictionaries automatically mined from Wikipedia . We further describe the use of these components in a domain-specific interactive query translation service. The interactive system selects the hybrid translation by default, with other possible translations being offered to the user interactively to enable them to select alternative or additional translation(s). The objective of this interactive service is to provide user control of translation while maximising translation accuracy and minimizing the translation effort of the user. Experiments using our hybrid translation system with sample query logs from users of CH websites demonstrate a large improvement in the accuracy of domain-specific phrase detection and translation. 0 0
IULib: Where UDL and Wikipedia could meet Tian Y.
Huang T.
Gao W.
Proceedings of SPIE - The International Society for Optical Engineering English 2011 Empowering the group collaboration and knowledge-sharing capabilities for the Universal Digital Library (UDL) is definitely an important work after more than 1.5 million digitalized books were open to access online. One motivation of developing such a platform is the emergence of Web 2.0 in recent years, especially with the rapidly increased popularity of Wikipedia. This paper presents our vision, which we call iULib, about where and how UDL and Wikipedia could meet. In the first phase, we directly apply the Wiki architecture and software in UDL to upgrade the digital library as an interactive platform that facilitates community and collaboration. Preliminary implementation shows the feasibility and reliability of our design. Furthermore, as a free encyclopedia that assembles contributions from different users, Wikipedia may also be used as a knowledge base for UDL. As a result, UDL can be upgraded as an intelligent platform for information retrieval and knowledge sharing. Our practice at the WikipediaMM task in the ImgeCLEF 2008 shows that the knowledge network constructed from Wikipedia can be used to effectively expand the query semantics of image retrieval. It is expected that Wikipedia and digital library can integrate each other's valuable results and best practices to benefit each other. 0 0
Metadata enrichment via topic models for author name disambiguation Bernardi R.
Le D.-T.
Lecture Notes in Computer Science English 2011 This paper tackles the well known problem of Author Name Disambiguation (AND) in Digital Libraries (DL). Following [14,13], we assume that an individual tends to create a distinctively coherent body of work that can hence form a single cluster containing all of his/her articles yet distinguishing them from those of everyone else with the same name. Still, we believe the information contained in a DL may be not sufficient to allow an automatic detection of such clusters; this lack of information becomes even more evident in federated digital libraries, where the labels assigned by librarians may belong to different controlled vocabularies or different classification systems, and in digital libraries on the web where records may be not assigned neither subject headings nor classification numbers. Hence, we exploit Topic Models, extracted from Wikipedia, to enhance records metadata and use Agglomerative Clustering to disambiguate ambiguous author names by clustering together similar records; records in different clusters are supposed to have been written by different people. We investigate the following two research questions: (a) are the Classification Systems and Subject Heading labels manually assigned by librarians general and informative enough to disambiguate Author Names via clustering techniques? (b) Do Topic Models induce from large corpora the conceptual information necessary for labelling automatically DL metadata and grasp topic similarities of the records? To answer these questions, we will use the Library Catalogue of the Bolzano University Library as case study. 0 0
Retrieving attributes using web tables Kopliku A.
Pinel-Sauvagnat K.
Boughanem M.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries English 2011 In this paper we propose an attribute retrieval approach which extracts and ranks attributes from Web tables. We combine simple heuristics to filter out improbable attributes and we rank attributes based on frequencies and a table match score. Ranking is reinforced with external evidence from Web search, DBPedia and Wikipedia. Our approach can be applied to whatever instance (e.g. Canada) to retrieve its attributes (capital, GDP). It is shown it has a much higher recall than DBPedia and Wikipedia and that it works better than lexico-syntactic rules for the same purpose. 0 0
System description: EgoMath2 as a tool for mathematical searching on Misutka J.
Galambos L.
Lecture Notes in Computer Science English 2011 EgoMath is a full text search engine focused on digital mathematical content with little semantic information available. Recently, we have decided that another step towards making mathematics in digital form more accessible was to enable mathematical searching in one of the world's largest digital libraries - Wikipedia. The library is an excellent candidate for our mathematical search engine because the mathematical notation is represented by fragments which do not contain semantic information. 0 0
Word order matters: Measuring topic coherence with lexical argument structure Spagnola S.
Lagoze C.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries English 2011 Topic models are emerging tools for improved browsing and searching within digital libraries. These techniques collapse words within documents into unordered "bags of words," ignoring word order. In this paper, we present a method that examines syntactic dependency parse trees from Wikipedia article titles to learn expected patterns between relative lexical arguments. This process is highly dependent on the global word ordering of a sentence, modeling how each word interacts with other words to gain an aggregate perspective on how words interact over all 3.2 million titles. Using this information, we analyze how coherent a given topic is by comparing the relative usage vectors between the top 5 words in a topic. Results suggest that this technique can identify poor topics based on how well the relative usages align with each other within a topic, potentially aiding digital library indexing. 0 0
Rich Texts: Wikisource as an Open Access Repository for Law and the Humanities Timothy K. Armstrong University of Cincinnati College of Law Public Law & Legal Theory Research Paper Series English 15 May 2010 Open access to research and scholarship, although well established in the sciences, remains an emerging phenomenon in the legal academy. In recent years, a number of open access repositories have been created to permit self-archiving of legal scholarship (either within or across institutional boundaries), and faculties at some leading research institutions have adopted policies supporting open access to their work. Although existing repositories for legal scholarship represent a clear improvement over proprietary, subscription-based repositories in some ways, their architecture, and the narrowly defined missions they have elected to pursue, limit their ability to illuminate the ongoing dialogue among texts that is a defining characteristic of scholarly discourse in law and the humanities. One of the wiki-based projects operated by the nonprofit Wikimedia Foundation - the Wikisource digital library - improves upon the shortcomings of existing open access repositories by bringing source texts and commentary together in a single place, with additional contextual materials hosted on other Wikimedia Foundation sites just a click away. These features of Wikisource, if more widely adopted, may improve academic discourse by highlighting conceptual interconnections among works, fostering interdisciplinary collaboration, and reducing the competitive advantages of proprietary, closed-access legal information services. 3 0
Digital library educational module development strategies and sustainable enhancement by the community Yang S.
Kanan T.
Fox E.
Lecture Notes in Computer Science English 2010 The Digital Library Curriculum Development Project (http://curric.dlib.vt. edu) team has been developing educational modules and conducting field-tests internationally since January 2006. There had been three approaches for module development in the past. The first approach was that the project team members created draft modules (total of 9) and then those modules were reviewed by the experts in the field as well as by other members of the team. The second approach was that graduate student teams developed modules under the supervision of an instructor and the project team. Four members in each team collaborated for a single module. In total four modules were produced in this way. The last approach was that five graduate students developed a total of five modules, each module reviewed by two students. The completed modules were posted in for wider distribution and collaborative improvements by the community. The entire list of modules in the Digital Library Educational Framework also can be found in that location. 0 0
Do wikipedians follow domain experts?: A domain-specific study on wikipedia knowledge building YanChun Zhang
Aixin Sun
Anwitaman Datta
Kuiyu Chang
Lim E.-P.
Proceedings of the ACM International Conference on Digital Libraries English 2010 Wikipedia is one of the most successful online knowledge bases, attracting millions of visits daily. Not surprisingly, its huge success has in turn led to immense research interest for a better understanding of the collaborative knowledge building process. In this paper, we performed a (terrorism) domain-specific case study, comparing and contrasting the knowledge evolution in Wikipedia with a knowledge base created by domain experts. Specifically, we used the Terrorism Knowledge Base (TKB) developed by experts at MIPT. We identified 409 Wikipedia articles matching TKB records, and went ahead to study them from three aspects: creation, revision, and link evolution. We found that the knowledge building in Wikipedia had largely been independent, and did not follow TKB - despite the open and online availability of the latter, as well as awareness of at least some of the Wikipedia contributors about the TKB source. In an attempt to identify possible reasons, we conducted a detailed analysis of contribution behavior demonstrated by Wikipedians. It was found that most Wikipedians contribute to a relatively small set of articles each. Their contribution was biased towards one or very few article(s). At the same time, each article's contributions are often championed by very few active contributors including the article's creator. We finally arrive at a conjecture that the contributions in Wikipedia are more to cover knowledge at the article level rather than at the domain level. 0 1
Efficient visualization of content and contextual information of an online multimedia digital library for effective browsing Mishra S.
Gorai A.
Oberoi T.
Ghosh H.
Proceedings - 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2010 English 2010 In this paper, we present a few innovative techniques for visualization of content and contextual information of a multimedia digital library for effective browsing. A traditional collection visualization portal often depicts some metadata or a short synopsis, which is quite inadequate for assessing the documents. We have designed a novel web portal that incorporates a few preview facilities to disclose an abstract of the contents. Moreover, we place the documents on Google Maps to make its geographical context explicit. A semantic network, created automatically around the collection, brings out other contextual information from external knowledge resources like Wikipedia which is used for navigating collection. This paper also reports economical hosting techniques using Amazon Cloud. 0 0
Evaluating topic models for digital libraries Newman D.
Noh Y.
Talley E.
Karimi S.
Baldwin T.
Proceedings of the ACM International Conference on Digital Libraries English 2010 Topic models could have a huge impact on improving the ways users find and discover content in digital libraries and search interfaces, through their ability to automatically learn and apply subject tags to each and every item in a collection, and their ability to dynamically create virtual collections on the fly. However, much remains to be done to tap this potential, and empirically evaluate the true value of a given topic model to humans. In this work, we sketch out some sub-tasks that we suggest pave the way towards this goal, and present methods for assessing the coherence and inter-pretability of topics learned by topic models. Our large-scale user study includes over 70 human subjects evaluating and scoring almost 500 topics learned from collections from a wide range of genres and domains. We show how a scoring model - based on pointwise mutual information of word-pairs using Wikipedia, Google and MEDLINE as external data sources - performs well at predicting human scores. This automated scoring of topics is an important first step to integrating topic modeling into digital libraries. 0 0
Exploiting time-based synonyms in searching document archives Kanhabua N.
Norvag K.
Proceedings of the ACM International Conference on Digital Libraries English 2010 Query expansion of named entities can be employed in order to increase the retrieval effectiveness. A peculiarity of named entities compared to other vocabulary terms is that they are very dynamic in appearance, and synonym relationships between terms change with time. In this paper, we present an approach to extracting synonyms of named entities over time from the whole history of Wikipedia. In addition, we will use their temporal patterns as a feature in ranking and classifying them into two types, i.e., time-independent or time-dependent. Time-independent synonyms are invariant to time, while time-dependent synonyms are relevant to a particular time period, i.e., the synonym relationships change over time. Further, we describe how to make use of both types of synonyms to increase the retrieval effectiveness, i.e., query expansion with time-independent synonyms for an ordinary search, and query expansion with time-dependent synonyms for a search wrt. temporal criteria. Finally, through an evaluation based on TREC collections, we demonstrate how retrieval performance of queries consisting of named entities can be improved using our approach. 0 0
Jump-starting a body-of-knowledge with a semantic wiki on a discipline ontology Codocedo V.
Lopez C.
Astudillo H.
CEUR Workshop Proceedings English 2010 Several communities have engaged recently in assembling a Body of Knowledge (BOK) to organize the discipline knowledge for learning and sharing. BOK ideally represents the domain, contextualizes assets (e.g. literature), and exploits the Social Web potential to maintain and improve it. Semantic wikis are excellent tools to handle domain (ontological) representations, to relate items, and to enable collaboration. Unfortunately, creating a whole BOK (structure, content and relations) from scratch may fall prey to the "white page syndrome"1, given the size and complexity of the domain information. This article presents an approach to jump-start a BOK, by implementing it as a semantic wiki organized around a domain ontology. Domain representation (structure and content) are initialized by automatically creating wiki pages for each ontology concept and digital asset; the ontology itself is semi-automatically built using natural language processing (NLP) techniques. Contextualization is initialized by automatically linking concept- and asset-pages. The proposal's feasibility is shown with a prototype for a Software Architecture BOK, built from 1,000 articles indexed by a well-known scientific digital library and completed by volunteers. The proposed approach separates the issues of domain representation, resources contextualization, and social elaboration, allowing communities to try on alternate solutions for each issue. 0 0
Measuring peculiarity of text using relation between words on the web Nakabayashi T.
Yumoto T.
Nii M.
Yuku Takahashi
Sumiya K.
Lecture Notes in Computer Science English 2010 We define the peculiarity of text as a metric of information credibility. Higher peculiarity means lower credibility. We extract the theme word and the characteristic words from text and check whether there is a subject-description relation between them. The peculiarity is defined using the ratio of the subject-description relation between a theme word and characteristic words. We evaluate the extent to which peculiarity can be used to judge by classifying text from Wikipedia and Uncyclopedia in terms of the peculiarity. 0 0
Meta-metadata: A metadata semantics language for collection representation applications Kerne A.
Qu Y.
Webb A.M.
Damaraju S.
Lupfer N.
Mathur A.
International Conference on Information and Knowledge Management, Proceedings English 2010 Collecting, organizing, and thinking about diverse information resources is the keystone of meaningful digital information experiences, from research to education to leisure. Metadata semantics are crucial for organizing collections, yet their structural diversity exacerbates problems of obtaining and manipulating them, strewing end users and application developers amidst the shadows of a proverbial tower of Babel. We introduce meta-metadata, a language and software architecture addressing a metadata semantics lifecycle: (1) data structures for representation of metadata in programs; (2) metadata extraction from information resources; (3) semantic actions that connect metadata to collection representation applications; and (4) rules for presentation to users. The language enables power users to author metadata semantics wrappers that generalize template-based information sources. The architecture supports development of independent collection representation applications that reuse wrappers. The initial meta-metadata repository of information source wrappers includes Google, Flickr, Yahoo, IMDb, Wikipedia, and the ACM Portal. Case studies validate the approach. 0 0
Semantic relatedness approach for named entity disambiguation Gentile A.L.
Zhang Z.
Linsi Xia
Iria J.
Communications in Computer and Information Science English 2010 Natural Language is a mean to express and discuss about concepts, objects, events, i.e., it carries semantic contents. One of the ultimate aims of Natural Language Processing techniques is to identify the meaning of the text, providing effective ways to make a proper linkage between textual references and their referents, that is, real world objects. This work addresses the problem of giving a sense to proper names in a text, that is, automatically associating words representing Named Entities with their referents. The proposed methodology for Named Entity Disambiguation is based on Semantic Relatedness Scores obtained with a graph based model over Wikipedia. We show that, without building a Bag of Words representation of the text, but only considering named entities within the text, the proposed paradigm achieves results competitive with the state of the art on two different datasets. 0 0
The implications of information democracy and digital socialism for public libraries Oguz E.S.
Kajberg L.
Communications in Computer and Information Science English 2010 In these times, public libraries in many countries have increasingly come under pressure from developments within the information landscape. Thus, not least because of the massive digitization of information resources, the proliferation and popularity of search engines, in particular Google, and the booming technologies of Web 2.0, public libraries find themselves in a very complex situation. In fact, the easy-to-use technologies of Web 2.0 challenge the basic principles of information services provision undertaken by libraries. The new digital information environment and social software tools such as blogs, wikis and social networking sites have fuelled a discussion of the future of public libraries as information providers. After all there seems to be a need for public libraries to reorient their aims and objectives and to redefine their service identity. At the same time search engines, and especially Google, are increasingly coming under scrutiny. Thus, analysis results referred to show that the conception of information and the underlying purpose of Google differ from those of public libraries. Further, an increasing amount of criticism is being directed at collaborative spaces (typically Wikipedia) and social networks (e.g. MySpace) and it is pointed out that these social media are not that innocent and unproblematic. In discussing the survival of public libraries and devising an updated role for libraries in the age of Google and social media, attention should be given to fleshing out a new vision for the public library as a provider of alternative information and as an institution supporting information democracy. 0 0
User-contributed descriptive metadata for libraries and cultural institutions Zarro M.A.
Allen R.B.
Lecture Notes in Computer Science English 2010 The Library of Congress and other cultural institutions are collecting highly informative user-contributed metadata as comments and notes expressing historical and factual information not previously identified with a resource. In this observational study we find a number of valuable annotations added to sets of images posted by the Library of Congress on the Flickr Commons. We propose a classification scheme to manage contributions and mitigate information overload issues. Implications for information retrieval and search are discussed. Additionally, the limits of a "collection" are becoming blurred as connections are being built via hyperlinks to related resources outside of the library collection, such as Wikipedia and locally relevant websites. Ideas are suggested for future projects, including interface design and institutional use of user-contributed information. 0 0
Wiki-based digital libraries information services in China and abroad 2010 6th International Conference on Wireless Communications, Networking and Mobile Computing, WiCOM 2010 English 2010 0 0
An ontology-based approach for key phrase extraction Nguyen C.Q.
Phan T.T.
ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf. English 2009 Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP). In this work, we propose a novel method for key phrase extracting of Vietnamese text that exploits the Vietnamese Wikipedia as an ontology and exploits specific characteristics of the Vietnamese language for the key phrase selection stage. We also explore NLP techniques that we propose for the analysis of Vietnamese texts, focusing on the advanced candidate phrases recognition phase as well as part-of-speech (POS) tagging. Finally, we review the results of several experiments that have examined the impacts of strategies chosen for Vietnamese key phrase extracting. 0 0
Automatic quality assessment of content created collaboratively by web communities: a case study of Wikipedia Daniel H. Dalip
Marcos A. Gonçalves
Marco Cristo
Pável Calado
English 2009 The old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solution and show significant improvements in terms of effective quality prediction. 0 3
Exploitation of the Wikipedia category system for enhancing the value of LCSH Yoji Kiyota
Hiroshi Nakagawa
Satoshi Sakai
Tatsuya Mori
Hidetaka Masuda
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries English 2009 This paper addresses an approach that integrates two different types of information resources: the Web and libraries. Our method begins from any keywords in Wikipedia, and induces related subject headings of LCSH through the Wikipedia category system. 0 0
Key phrase extraction: A hybrid assignment and extraction approach Nguyen C.Q.
Phan T.T.
IiWAS2009 - The 11th International Conference on Information Integration and Web-based Applications and Services English 2009 Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP). In this work, we propose a novel method for key phrase extracting of Vietnamese text that combines assignment and extraction approaches. We also explore NLP techniques that we propose for the analysis of Vietnamese texts, focusing on the advanced candidate phrases recognition phase as well as part-of-speech (POS) tagging. Then we propose a method that exploits specific characteristics of the Vietnamese language and exploits the Vietnamese Wikipedia as an ontology for key phrase ambiguity resolution. Finally, we show the results of several experiments that have examined the impacts of strategies chosen for Vietnamese key phrase extracting. 0 0
Article recommendation based on a topic model for Wikipedia Selection for Schools Choochart Haruechaiyasak
Chaianun Damrongrat
Lecture Notes in Computer Science English 2008 The 2007 Wikipedia Selection for Schools is a collection of 4,625 selected articles from Wikipedia as educational for children. Users can currently access articles within the collection via two different methods: (1) by browsing on either a subject index or a title index sorted alphabetically, and (2) by following hyperlinks embedded within article pages. These two retrieval methods are considered static and subjected to human editors. In this paper, we apply the Latent Dirichlet Allocation (LDA) algorithm to generate a topic model from articles in the collection. Each article can be expressed by a probability distribution on the topic model. We can recommend related articles by calculating the similarity measures among the articles' topic distribution profiles. Our initial experimental results showed that the proposed approach could generate many highly relevant articles, some of which are not covered by the hyperlinks in a given article. 0 0
Gazetiki: Automatic creation of a geographical gazetteer Adrian Popescu
Gregory Grefenstette
Moellic P.-A.
Proceedings of the ACM International Conference on Digital Libraries English 2008 Geolocalized databases are becoming necessary in a wide variety of application domains. Thus far, the creation of such databases has been a costly, manual process. This drawback has stimulated interest in automating their construction, for example, by mining geographical information from the Web. Here we present and evaluate a new automated technique for creating and enriching a geographical gazetteer, called Gazetiki. Our technique merges disparate information from Wikipedia, Panoramio, and web search, engines in order to identify geographical names, categorize these names, find their geographical coordinates and rank them. The information produced in Gazetiki enhances and complements the Geonames database, using a similar domain model. We show that our method provides a richer structure and an improved coverage compared to another known attempt at automatically building a geographic database and, where possible, we compare our Gazetiki to Geonames. Copyright 2008 ACM. 0 0
Harnessisg social networks to connect with audiences: If you build it, will they come 2.0? Belden D. Internet Reference Services Quarterly English 2008 Digital libraries offer users a wealth of online resources, but most of these materials remain hidden to potential users. Established strategies for outreach and promotion bring limited success when trying to connect with users accustomed to Googling their way through research. Social Networks provide an opportunity for connecting with audiences in the places they habitually seek information. The University of North Texas Libraries' Portal to Texas History ( has experienced dramatic increases in Web usage and reference requests by harnessing the power of social networks such as Wikipedia and MySpace. 0 0
On visualizing heterogeneous semantic networks from multiple data sources Maureen
Aixin Sun
Lim E.-P.
Anwitaman Datta
Kuiyu Chang
Lecture Notes in Computer Science English 2008 In this paper, we focus on the visualization of heterogeneous semantic networks obtained from multiple data sources. A semantic network comprising a set of entities and relationships is often used for representing knowledge derived from textual data or database records. Although the semantic networks created for the same domain at different data sources may cover a similar set of entities, these networks could also be very different because of naming conventions, coverage, view points, and other reasons. Since digital libraries often contain data from multiple sources, we propose a visualization tool to integrate and analyze the differences among multiple social networks. Through a case study on two terrorism-related semantic networks derived from Wikipedia and Terrorism Knowledge Base (TKB) respectively, the effectiveness of our proposed visualization tool is demonstrated. 0 0
Qualitative geocoding of persistent web pages Angel A.
Lontou C.
Pfoser D.
Efentakis A.
GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems English 2008 Information and specifically Web pages may be organized, indexed, searched, and navigated using various metadata aspects, such as keywords, categories (themes), and also space. While categories and keywords are up for interpretation, space represents an unambiguous aspect to structure information. The basic problem of providing spatial references to content is solved by geocoding; a task that relates identifiers in texts to geographic co-ordinates. This work presents a methodology for the semiautomatic geocoding of persistent Web pages in the form of collaborative human intervention to improve on automatic geocoding results. While focusing on the Greek language and related Web pages, the developed techniques are universally applicable. The specific contributions of this work are (i) automatic geocoding algorithms for phone numbers, addresses and place name identifiers and (ii) a Web browser extension providing a map-based interface for manual geocoding and updating the automatically generated results. With the geocoding of a Web page being stored as respective annotations in a central repository, this overall mechanism is especially suited for persistent Web pages such as Wikipedia. To illustrate the applicability and usefulness of the overall approach, specific geocoding examples of Greek Web pages are presented. 0 0
Smarter, better, stronger, together: A closer look at how collaboration is transforming the enterprise No author name available EContent English 2008 Various aspects of the project named We are smarter than me: How to unleash the power of crowds in your business, being set out and completed by Barry Libert and Jon Spector with the support of two American business schools are discussed. The book was written using a wiki-based community that invited 1 million people including student faculty and alumni from the fields of technology and management to contribute ideas. The authors posed question regarding the success of community approaches for marketing, business development, distribution other business practices. The implementation of web-based social communities and services in the enterprises has been made possible with the use of Web 2.0 technologies. Web 2.0 has many potential benefits and inspite of some limitations, it will continue to gain control on the business works as more and more companies will adopt it. 0 0
The importance of link evidence in Wikipedia Jaap Kamps
Marijn Koolen
Lecture Notes in Computer Science English 2008 Wikipedia is one of the most popular information sources on the Web. The free encyclopedia is densely linked. The link structure in Wikipedia differs from the Web at large: internal links in Wikipedia are typically based on words naturally occurring in a page, and link to another semantically related entry. Our main aim is to find out if Wikipedia's link structure can be exploited to improve ad hoc information retrieval. We first analyse the relation between Wikipedia links and the relevance of pages. We then experiment with use of link evidence in the focused retrieval of Wikipedia content, based on the test collection of INEX 2006. Our main findings are: First, our analysis of the link structure reveals that the Wikipedia link structure is a (possibly weak) indicator of relevance. Second, our experiments on INEX ad hoc retrieval tasks reveal that if the link evidence is made sensitive to the local context we see a significant improvement of retrieval effectiveness. Hence, in contrast with earlier TREC experiments using crawled Web data, we have shown that Wikipedia's link structure can help improve the effectiveness of ad hoc retrieval. 0 0
Towards automatic content tagging - Enhanced web services in digital libraries using lexical chaining Ulli Waltinger
Alexander Mehler
Heyer G.
WEBIST 2008 - 4th International Conference on Web Information Systems and Technologies, Proceedings English 2008 This paper proposes a web-based application which combines social tagging, enhanced visual representation of a document and the alignment to an open-ended social ontology. More precisely we introduce on the one hand an approach for automatic extraction of document related keywords for indexing and representing document content as an alternative to social tagging. On the other hand a proposal for automatic classification within a social ontology based on the German Wikipedia category taxonomy is proposed. This paper has two main goals: to describe the method of automatic tagging of digital documents and to provide an overview of the algorithmic patterns of lexical chaining that can be applied for topic tracking and -labelling of digital documents. 0 0
What types of translations hide in Wikipedia? Sjobergh J.
Sjobergh O.
Araki K.
Lecture Notes in Computer Science English 2008 We extend an automatically generated bilingual Japanese-Swedish dictionary with new translations, automatically discovered from the multi-lingual online encyclopedia Wikipedia. Over 50,000 translations, most of which are not present in the original dictionary, are generated, with very high translation quality. We analyze what types of translations can be generated by this simple method. The majority of the words are proper nouns, and other types of (usually) uninteresting translations are also generated. Not counting the less interesting words, about 15,000 new translations are still found. Checking against logs of search queries from the old dictionary shows that the new translations would significantly reduce the number of searches with no matching translation. 0 0
Wikiful thinking Doyle B. EContent English 2008 The advantages and weakness of using wiki as a content and knowledge management tool are discussed. Wiki is economical as some tools are open source and free, and it collects knowledge, explicit and tacit very quickly. Wikipedia, one of the 10 busiest sites on the web, has been a great success with about 5 million registered editors and about 8 million articles in different languages. Wiki does not operate through the standards-based technology and content management best practices such as content reuse, modularity, structured writing, and information typing resulting in a lack of interoperability, poor metadata management, and little reusability within the wiki. The methods of wiki navigation includes the built-in and web-based search engine. Standardization of wiki includes the use of XHTML and a WYSIWYG editor interface for unsophisticated content contributors and having hidden structure to facilitate information retrieval. 0 1
Applying Wikipedia's multilingual knowledge to cross-lingual question answering Ferrandez S.
Antonio Toral
Oscar Ferrandez
Antonio Ferrandez
Munoz R.
Lecture Notes in Computer Science English 2007 The application of the multilingual knowledge encoded in Wikipedia to an open-domain Cross-Lingual Question Answering system based on the Inter Lingual Index (ILI) module of EuroWordNet is proposed and evaluated. This strategy overcomes the problems due to ILI's low coverage on proper nouns (Named Entities). Moreover, as these are open class words (highly changing), using a community-based up-to-date resource avoids the tedious maintenance of hand-coded bilingual dictionaries. A study reveals the importance to translate Named Entities in CL-QA and the advantages of relying on Wikipedia over ILI for doing this. Tests on questions from the Cross-Language Evaluation Forum (CLEF) justify our approach (20% of these are correctly answered thanks to Wikipedia's Multilingual Knowledge). 0 0
Cracking software reuse Spinellis D. IEEE Software English 2007 The Unix system and its pipelines are a model of software reuse. Although many subsequent developments weren't similarly successful, by looking at Wikipedia and its MediaWiki engine, we find many levels of successful reuse. It seems that software repositories, package-management systems, shared-library technologies, and language platforms have increased reuse's return on investment. The Internet has also catalyzed software reuse by bringing both developer groups and development efforts closer to their users. 0 0
Exploit semantic information for category annotation recommendation in Wikipedia Yafang Wang
Haofen Wang
Haiping Zhu
Yiqin Yu
Lecture Notes in Computer Science English 2007 Compared with plain-text resources, the ones in "semi-semantic" web sites, such as Wikipedia, contain high-level semantic information which will benefit various automatically annotating tasks on themself. In this paper, we propose a "collaborative annotating" approach to automatically recommend categories for a Wikipedia article by reusing category annotations from its most similar articles and ranking these annotations by their confidence. In this approach, four typical semantic features in Wikipedia, namely incoming link, outgoing link, section heading and template item, are investigated and exploited as the representation of articles to feed the similarity calculation. The experiment results have not only proven that these semantic features improve the performance of category annotating, with comparison to the plain text feature; but also demonstrated the strength of our approach in discovering missing annotations and proper level ones for Wikipedia articles. 0 0
Freebase: A shared database of structured general human knowledge Bollacker K.
Cook R.
Tufts P.
Proceedings of the National Conference on Artificial Intelligence English 2007 Freebase is a practical, scalable, graph-shaped database of structured general human knowledge, inspired by Semantic Web research and collaborative data communities such as the Wikipedia. Freebase allows public read and write access through an HTTP-based graph-query API for research, the creation and maintenance of structured data, and application building. Access is free and all data in Freebase has a very open (e.g. Creative Commons, GFDL) license. Copyright © 2007, Association for the Advancement of Artificial Intelligence ( All rights reserved. 0 0
Impact of digital information resources in the toxicology literature Robinson L. Aslib Proceedings: New Information Perspectives English 2007 Purpose - The purpose of the study reported here was to assess the degree to which new forms of web-based information and communication resources impact on the formal toxicology literature, and the extent of any change between 2000 and 2005. Design/methodology/approach - The paper takes the form of an empirical examination of the full content of four toxicology journals for the year 2000 and for the year 2005, with analysis of the results, comparison with similar studies in other subject areas, and with a small survey of the information behaviour of practising toxicologists. Findings - Scholarly communication in toxicology has been relatively little affected by new forms of information resource (weblogs, wikis, discussion lists, etc.). Citations in journal articles are still largely to "traditional" resources, though a significant increase in the proportion of web-based material being cited in the toxicology literature has occurred between 2000 and 2005, from a mean of 3 per cent to a mean of 19 per cent. Research limitations/implications - The empirical research is limited to an examination of four journals in two samples of one year each. Originality/value - This is the only recent study of the impact of new ICTs on toxicology communication. It adds to the literature on the citation of digital resources in scholarly publications. 0 0
MediaWiki open-source software as infrastructure for electronic resources outreach Jackson M.
Blackburn J.D.
McDonald R.H.
Reference Librarian English 2007 This article describes the bundling of MediaWiki into the electronic resource access strategy to enable custom content that supports online training and course-based information literacy objectives. © 2007 by The Haworth Press, Inc. All rights reserved. 0 0
NSDL MatDL: Adding context to bridge materials e-research and e-education Bartolo L.
Lowe C.
Krafft D.
Tandy R.
Lecture Notes in Computer Science English 2007 The National Science Digital Library (NSDL) Materials Digital Library Pathway (MatDL) has implemented an information infrastructure to disseminate government funded research results and to provide content as well as services to support the integration of research and education in materials. This poster describes how we are integrating a digital repository into opensource collaborative tools, such as wikis, to support users in materials research and education as well as interactions between the two areas. A search results plug-in for MediaWiki has been developed to display relevant search results from the MatDL repository in the Soft Matter Wiki established and developed by MatDL and its partners. Collaborative work with the NSDL Core Integration team at Cornell University is also in progress to enable information transfer in the opposite direction, from a wiki to a repository. 0 0
Temporal analysis of the wikigraph Buriol L.S.
Carlos Castillo
Debora Donato
Leonardi S.
Millozzi S.
Proceedings - 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings), WI'06 English 2007 Wikipedia is an online encyclopedia, available in more than 100 languages and comprising over .1 million articles in its English version. If we consider each Wlkipedia article as a node and each hyperlink between articles as an arc we have a "Wikigraph", a graph that represents the link structure of Wlkipedia. The Wikigraph differs from other Web graphs studied in the literature by the fact that there are explicit timestamps associated with each node's events. This allows us to do a detailed analysis of the Wlkipedia evolution over time. In the first part of this study we characterize this evolution in terms of users, editions and articles; in the second part, we depict the temporal evolution of several topological properties of the Wikigraph. The insights obtained from the Wikigraphs can be applied to large Web graphs from which the temporal data is usually not available. 0 4
Wikipedia in materials education Powell IV A.C.
Morris A.E.
TMS Annual Meeting English 2007 Wikipedia has become a vast storehouse of human knowledge, and a first point of reference for millions of people from all walks of life, including many materials science and engineering (MSE) students. Its characteristics of open authorship and instant publication lead to both its main strength of broad, timely coverage and also its weakness of non-uniform quality. This talk will discuss the status and potential of this medium as a delivery mechanism for materials education content, some experiences with its use in the classroom, and its fit with other media from textbooks to digital libraries. 0 0
Wikipedia in the pocket: Indexing technology for near-duplicate detection and high similarity search Martin Potthast Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 English 2007 We develop and implement a new indexing technology which allows us to use complete (and possibly very large) documents as queries, while having a retrieval performance comparable to a standard term query. Our approach aims at retrieval tasks such as near duplicate detection and high similarity search. To demonstrate the performance of our technology we have compiled the search index "Wikipedia in the Pocket", which contains about 2 million English and German Wikipedia articles.1 This index - along with a search interface - fits on a conventional CD (0.7 gigabyte). The ingredients of our indexing technology are similarity hashing and minimal perfect hashing. 0 0
Building a design engineering digital library: The workflow issues Grierson H.
Wodehouse A.
Breslin C.
Ion W.
Juster N.
DS 38: Proceedings of E and DPE 2006, the 8th International Conference on Engineering and Product Design Education English 2006 Over the past 2 years the Design Manufacturing and Engineering Management Department at the University of Strathclyde has been developing a digital library to support student design learning in global team-based design engineering projects through the DIDET project [1]. Previous studies in the classroom have identified the need for the development of two parallel systems - a shared workspace, the LauLima Learning Environment (LLE) and a digital library, the LauLima Digital Library (LDL) [2]. These two elements are encapsulated within LauLima, developed from the open-sourced groupware Tikiwiki. This paper will look at the workflow in relation to populating the digital library, discuss the issues as they are experienced by staff and students, e.g. the application of metadata (keywords and descriptions); harvesting of resources; reuse in classes; granularity; intellectual property rights and digital rights management (IPR and DRM), and make suggestions for improvement. 0 0
Community Building around Encyclopaedic Knowledge Maurer H Kolbitsch J Journal of Computing and Information Technology 14 The paper discusses it as one of the main examples. Despite not mentioning Wikipedia in title or abstract 2006 This paper gives a brief overview of current technologies in systems handling encyclopaedic knowledge. Since most of the electronic encyclopaedias currently available are rather static and inflexible, greatly enhanced functionality is introduced that enables users to work more effectively and collaboratively. Users have the ability, for instance, to add annotations to every kind of object and can have private and shared workspaces. The techniques described employ user profiles in order to adapt to different users and involve statistical analysis to improve search results. Moreover, a tracking and navigation mechanism based on trails is presented. The second part of the paper details community building around encyclopaedic knowledge with the aim to involve “plain” users and experts in environments with largely editorial content. The foundations for building a user community are specified along with significant facets such as retaining the high quality of content, rating mechanisms and social aspects. A system that implements large portions of the community-related concepts in a heterogeneous environment of several largely independent data sources is proposed. Apart from online and DVD-based encyclopaedias, potential application areas are e-Learning, corporate documentation and knowledge management systems. 0 1
Integration of Wikipedia and a geography digital library Lim E.-P.
Zhe Wang
Sadeli D.
Yanyan Li
Chang C.-H.
Kalyani Chatterjea
Goh D.H.-L.
Theng Y.-L.
Jinghua Zhang
Aixin Sun
Lecture Notes in Computer Science English 2006 In this paper, we address the problem of integrating Wikipedia, an online encyclopedia, and G-Portal, a web-based digital library, in the geography domain. The integration facilitates the sharing of data and services between the two web applications that are of great value in learning. We first present an overall system architecture for supporting such an integration and address the metadata extraction problem associated with it. In metadata extraction, we focus on extracting and constructing metadata for geo-political regions namely cities and countries. Some empirical performance results will be presented. The paper will also describe the adaptations of G-Portal and Wikipedia to meet the integration requirements. 0 0
Understanding user perceptions on usefulness and usability of an integrated Wiki-G-Portal Theng Y.-L.
Yanyan Li
Lim E.-P.
Zhe Wang
Goh D.H.-L.
Chang C.-H.
Kalyani Chatterjea
Jinghua Zhang
Lecture Notes in Computer Science English 2006 This paper describes a pilot study on Wiki-G-Portal, a project integrating Wikipedia, an online encyclopedia, into G-Portal, a Web-based digital library, of geography resources. Initial findings from the pilot study seemed to suggest positive perceptions on usefulness and usability of Wiki-G-Portal, as well as subjects' attitude and intention to use. 0 0
What would you do if you received a "cease and desist" letter? Starr J. Searcher:Magazine for Database Professionals English 2006 The 'Cease and Desist' letters in Wikipedia, which gives chilling effects to users of Web sites is discussed. Students at the project's team of law school clinics then annotate the letters with meaningful links to legal information that can help readers understand the legitimacy of the author's claims. The chilling effects team reported a disturbing number of legal flaws in so-called DMCA notices, which result in online materials being pulled from the Internet. The chilling effects of cease and desist letters is helping people to understanding their rights. 0 0
Wikipedia and Britannica Berinstein P. Searcher:Magazine for Database Professionals English 2006 The Wikipedia, which is a dynamic web-based encyclopedia continues to gain popularity and size. The community pages of Wikipedia assert that its goal is to create a free, democratic, reliable encyclopedia. The community pages also explain that what constitutes notability is always under debate. The Wikipedia is open to everyone, not only to read, but also to create and maintain and governed primarily by community consensus. 0 0
Analyzing and Visualizing the Semantic Coverage of Wikipedia and Its Authors Holloway
Miran Bozicevic
Katy Börner cs. IR/0512085 / Submitted to Complexity, Special issue on Understanding Complex Systems. 2005 This paper presents a novel analysis and visualization of English Wikipedia data. Our specific interest is the analysis of basic statistics, the identification of the semantic structure and age of the categories in this free online encyclopedia, and the content coverage of its highly productive authors. The paper starts with an introduction of Wikipedia and a review of related work. We then introduce a suite of measures and approaches to analyze and map the semantic structure of Wikipedia. The results show that co-occurrences of categories within individual articles have a power-law distribution, and when mapped reveal the nicely clustered semantic structure of Wikipedia. The results also reveal the content coverage of the article's authors, although the roles these authors play are as varied as the authors themselves. We conclude with a discussion of major results and planned future work. 0 9
Eyes of a Wiki: Automated navigation map Han H.-S.
Hyeoncheol Kim
Lecture Notes in Computer Science English 2005 There are many potential uses of a Wiki within a community-based digital library. Users share individual ideas to build up community knowledge by efficient and effective collaborative authoring and communications that a Wiki provides. In our study, we investigated how the community knowledge is organized into a knowledge structure that users can access and modify efficiently. Since a Wiki provides users with freedom of editing any pages, a Wiki site increases and changes dynamically, We also developed a tool that helps users to navigate easily in the dynamically changing link structure. In our experiment, it is shown that the navigation tool fosters Wiki users to figure out the complex site structure more easily and thus to build up more well-structured community knowledge base. We also show that a Wiki with the navigation tool improves collaborative learning in a web-based e-learning environment. 0 0
Saying "I do" to podcasting another "next big thing" for librarians? Gordan-Murane L. Searcher:Magazine for Database Professionals English 2005 The technological advancement of podcasting phenomena, which can be received by using an MP3 player, is discussed. Podcasting can range from home-grown audio dairies between family, friends, and colleagues to professionally produced radio shows. Online Programming for All Librariea (OPAL), a collaborative effort by libraries to provide cooperative Web-based programming and training for library users and library staff members, has made its audio content available as a podcast. The interesting work with tagging, folksonomies, metadata, and classification should be applied to podcasting as well. 0 0