User:Moudy83/conference papers2

From WikiPapers
Jump to: navigation, search
Authors Title Conference / published in Year Online Notes Abstract Keywords
Choi, Key-Sun IT Ontology and Semantic Technology International Conference on Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. 2007 IT (information technology) ontology is to be used for analyzing the information technology as well as for enhancing it. Semantic technology is compared with the syntactic one. Ontology plays a backbone for meaning-centered reconfiguration of syntactic structure, which is one aspect of semantic technology. The purpose of use of IT} ontology will be categorized into two things: to capture the right information and services for user requests, and on the other hand, to give insights for the future IT} with their possible paths by interlinking relations on component classes and instances. Consider question-answering based on ontology to improve the performance of QA.} Each question type (e.g., {5W1H) will seek its specific relation from the ontology that has already been acquired from the relevant information resources (e.g., Wikipedia or news articles). The question is whether such relations and related classes are so neutral independent of domain or they are affected by each specific-domain. The first step of ontology learning for question-answering application is to find such neutral relation discovery mechanism and to take care of the special distorted relation-instance mapping when populating on the domain resources. Then, we will consider the domain ontology acquisition by top-down manner from already made similar resources (e.g., domain-specific thesaurus) and also bottom-up manner from the relevant resources. But the already-made resources should be checked against the current available resources for their coverage. Problem is that thesaurus is comprised of classes, not the instances of terms that appear in corpora. They have little coverage over the resources, and even the mapping between classes and instances has not been established yet in this stage. Clustering technology could now filter out the irrelevant mappings. Features of clustering could be improved more accurate by using more semantic ones that have been accumulated during the steps. For example, discov- ery process based on patterns could be evolved by putting the discovered semantic features into the patterns. Keeping ontology use for question-answering in mind, it is asked for how much the acquired ontology can represent the resources used for acquisition processes. Derived questions are summarized into two about: (1) how such ideal complete ontology could be generated for each specification of use, and (2) how much ontology contributes to the intended problem-solving. The ideal case is to convert all of resources to their corresponding ontology. But if presupposing the gap between the meaning of resources and acquired ontology, a set of raw chunks in resources may be still effective to answer for given questions with some help from acquired ontology or even without resort to them. Definitions of classes and relations in ontology would be manifested through dual structure to supplement the complementary factors between the idealized complete noise-free ontology shape and incomplete error-prone knowledge. In the result, we now confront two problems: how to measure the ontology effectiveness for each situation, and how to compare with the use of ontology for each application and to transform into another shape of ontology depending on application, that could be helped by granularity control and even extended to reconfiguration of knowledge structure. In the result, the intended IT} ontology is modularized enough to be compromised later for each purpose of use, and in efficient and effective ways. Still we have to solve definition questions and their translation to ontology forms.
Paci, Giulio; Pedrazzi, Giorgio & Turra, Roberta Wikipedia based semantic metadata annotation of audio transcripts International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), 2010 11th 2010 A method to automatically annotate video items with semantic metadata is presented. The method has been developed in the context of the Papyrus project to annotate documentary- like broadcast videos with a set of relevant keywords using automatic speech recognition (ASR) transcripts as a primary complementary resource. The task is complicated by the high word error rate (WER) of the ASR} for this kind of videos. For this reason a novel relevance criterion based on domain information is proposed. Wikipedia is used both as a source of metadata and as a linguistic resource for disambiguating keywords and for eliminating the out of topic/out of domain keywords. Documents are annotated with relevant links to Wikipedia pages, concepts definitions, synonyms, translations and concepts categories.
Shachaf, P.; Hara, N.; Herring, S.; Callahan, E.; Solomon, P.; Stvilia, B. & Matei, S. Global perspective on Wikipedia research Proceedings of the American Society for Information Science and Technology 2008 [1] This panel will provide a global perspective on Wikipedia research. The literature on Wikipedia is mostly anecdotal, and most of the research has focused attention primarily on the English Wikipedia examining the accuracy of entries compared to established online encyclopedias (Emigh} \& Herring, 2005; Giles, 2005; Rosenzweig, 2006) and analyzing the evolution of articles over time (Viégas, Wattenberg, \& Dave, 2004; Viégas, Wattenberg, Kriss, \& van Ham, 2007). Others have examined the quality of contribution (Stvilia} et al., 2005). However, only a few studies have conducted comparative analyses across languages or analyzed Wikipedia in languages other than English (e.g., Pfeil, Zaphiris, \& Ang, 2006). There is a need for international, cross-cultural understanding of Wikipedia. In an effort to address this gap, this panel will present a range of international and cross-cultural research of Wikipedia. The presenters will contribute different perspectives of Wikipedia as an international sociocultural institution and will describe similarities and differences across various national/language versions of Wikipedia. Shachaf and Hara will present variation of norms and behaviors on talk pages in various languages of Wikipedia. Herring and Callahan will share results from a cross-language comparison of biographical entries that exhibit variations in content of entries in the English and Polish versions of Wikipedia and will explain how they are influenced by the culture and history of the US} and Poland. Stvilia will discuss some of the commonalities and variability of quality models used by different Wikipedias, and the problems of cross-language quality measurement aggregation and reasoning. Matei will describe the social structuration and distribution of roles and efforts in wiki teaching environments. Solomon's comments, as a discussant, will focus on how these comparative insights provide evidence of the ways in which an evolving institution, such as Wikipedia, may be a force for supporting cultural identity (or not).
Schumann, E. T.; Brunner, L.; Schulz, K. U. & Ringlstetter, C. A semantic interface for post secondary education programs Proceedings of the American Society for Information Science and Technology 2008 [2] We describe a prototype for a multilingual semantic interface to the academic programs of a university. Navigating within a graph model of the academic disciplines and fields, the users are led to course and program documents. For core academic concepts, informational support is provided by language specific links to Wikipedia. The web-based prototype is currently evaluated in a user study.
Ueda, H. & Murakami, H. Suggesting Japanese subject headings using web information resources Proceedings of the American Society for Information Science and Technology 2006 [3] we propose a method that suggests BSH4} (Japan} Library Association, 1999) subject headings according to user queries when pattern matching algorithms fail to produce a hit. As user queries are diverse and unpredictable, we explore a method that makes a suggestion even when the query is a new word. We investigate the use of information obtained from Wikipedia (“Wikipedia,‿} n.d.), the Amazon Web Service (AWS), and Google. We implemented the method, and our system suggests ten BSH4} subject headings according to user queries.
Gazan, R.; Shachaf, P.; Barzilai-Nahon, K.; Shankar, K. & Bardzell, S. Social computing as co-created experience Proceedings of the American Society for Information Science and Technology 2007 [4] One of the most interesting effects of social computing is that the line between users and designers has become increasingly uncertain. Examples abound-user-generated content, rating and recommendation systems, social networking sites, open source software and easy personalization and sharing have effectively allowed users to become design partners in the creation of online experience. This panel will discuss four examples of social computing in practice, including the exercise of virtual social capital by members of the Answerbag online question-answering community, the thriving yet understudied user interactions on Wikipedia talk pages, self-regulation mechanisms of gatekeeping in virtual communities, and collaborative design practices within Second Life, a Massively Multiplayer Online Game (MMOG) that is also an interactive design environment. The aim of this panel is to challenge traditional understanding of users' role in the creation and evolution of information systems, and work toward a more realistic conceptualization of Web 2.0 users as both a source of, and a solution to, the overabundance of information created via social computing.
Buzydlowski, J. W. Exploring co-citation chains Proceedings of the American Society for Information Science and Technology 2006 [5] The game {“Six} Degrees of Kevin Bacon‿ is played by naming an actor and then, by thinking of other actors in movies such that a chain of connections can be made, linking the named actor with Kevin Bacon. The number of different movies that are used to link the actor to Bacon indicate the degree with which the two are linked. For example, using John Travolta as the named actor, he appeared in the movie Look Who' s Talking with Kirstie Alley, who was in She' s Having a Baby with Kevin Bacon. So, John Travolta has a Bacon number or degree of two, as connected via Kirstie Alley. (For} a more thorough discussion, see (http://en.wikipedia.org/wiki/Six\_Degrees\_of\_Kevin\_Bacon).} The example is taken from (http://www.geocities.com/theeac/bacon.html)). Based on the above, perhaps another title for this paper could be the {“Six} Degrees of Sir Francis Bacon,‿ as it indicates the framework for this paper by relating it to the above technique but placing it in an academic domain through the use of a scholarly bibliographic database. Additionally, the bibliometric technique of author co-citation analysis (ACA) will be used to help by automating the process of finding the connections.
Shachaf, P.; Hara, N.; Bonk, C.; Mackey, T. P.; Hemminger, B.; Stvilia, B. & Rosenbaum, H. Wiki a la carte: Understanding participation behaviors Proceedings of the American Society for Information Science and Technology 2007 [6] This panel focuses on trends in research on Wikis. Wikis have become prevalent in our society and are used for multiple purposes, such as education, knowledge sharing, collaboration, and coordination. Similar to other popular social computing tools, they raise new research questions and have attracted the attention of researchers in information science. While some focus on the semantic web, the automatic processing of data accumulated by users, and tool improvements, others discuss social implications of Wikis. This panel presents five studies that address the social uses of Wikis that support information sharing. In their studies, the panelists use a variety of novel applications of research methods, such as action research, and online ethnography, site observation, survey, and interviews. The panelists will present their findings: Shachaf and Hara will discuss Wikipedians' norms and behaviors; Bonk will present collaborative writing on Wikibook; Mackey will discuss authorship and collaboration in PBwiki.com;} Hemminger will share results from the early use of wikis for conference communications; and Stvilia will outline the community mechanism of information quality assurance in Wikipedia.
Shachaf, P.; Hara; Eschenfelder, K.; Goodrum, A.; Scott, L. C.; Shankar, K.; Ozakca, M. & Robbin Anarchists, pirates, ideologists, and disasters: New digital trends and their impacts Proceedings of the American Society for Information Science and Technology 2006 [7] This panel will address both online disasters created by anarchists and pirates and disaster relief efforts aided by information and communication technologies (ICTs).} An increasing number of people use (ICTs) to mobilize their resources and enhance their activities. This mobilization has unpredictable consequences for society: On one hand, use of ICT} has allowed for the mobilization of millions of people for disaster relief efforts and peace movements. On the other hand, it has also helped hackers, pirates to carryout destructive activities. In many cases it is hard to judge the moral consequences of the use of ICT} by marginalized groups. The panel will present five studies of which three will focus on online disobedience and two will focus on ICT} use for disaster. Together these presentations illustrate both positive and negative consequences of the new digital trends. Goodrum deliberates on an ethic of hacktivism in the context of online activism. Eschenfelder discusses user modification of or resistance to technological protection measures. Shachaf and Hara present a study of anarchists who attack information posted on Wikipedia and modify the content by deleting, renaming, reinterpreting, and recreating information according to their ideologies. Scott examines consumer media behaviors after hurricane Katrina and Rita disasters. Shankar and Ozakca discuss volunteer efforts in the aftermath of hurricane Katrina.
Ayers, P. Researching wikipedia - current approaches and new directions Proceedings of the American Society for Information Science and Technology 2006 [8] Wikipedia (), an international, multi-lingual and collaboratively produced free online encyclopedia, has experienced massive growth since its inception in 2001. The site has become the world's single largest encyclopedia as well as one of the world's most diverse online communities. Because of these factors, the site provides a unique view into the processes of collaborative work and the factors that go into producing encyclopedic content. To date, there has been no unified review of the current research that is taking place on and about Wikipedia, and indeed there have been few formal studies of the site, despite its growing importance. This project is a review of social science and information science studies of the site, focusing on research methods and categorizing the areas of the site that have been studied so far. Studies of Wikipedia have focused primarily on the social dynamics of contributors (such as how disputes are resolved and why contributors participate), and the content of Wikipedia (such as whether it is an accurate source), but due to the unique collaborative processes on Wikipedia these two areas are deeply intertwined.
Sundin, O. & Haider, J. Debating information control in web 2.0: The case of Wikipedia vs. Citizendium Proceedings of the American Society for Information Science and Technology 2007 [9] Wikipedia is continually being scrutinised for the quality of its content. The question addressed in this paper concerns which notions of information, of collaborative knowledge creation, of authority and of the role of the expert are drawn on when information control in WP} is discussed. This is done by focusing on the arguments made in the debates surrounding the launch of Citizendium, a proposed new collaborative online encyclopaedia. While Wikipedia claims not to attribute special status to any of its contributors, Citizendium intends to assign a decision-making role to subject experts. The empirical material for the present study consists of two online threads available from Slashdot. One, {“A} Look inside Citizendium‿, dates from September, the second one {“Co-Founder} Forks Wikipedia‿ from October 2006. The textual analysis of these documents was carried out through close interpretative reading. Five themes, related to different aspects of information control emerged: 1.information types, 2.information responsibility, 3. information perspectives, 4. information organisation, 5. information provenance \& creation. Each theme contains a number of different positions. It was found that these positions not necessarily correspond with the different sides of the argument. Instead, at times the fault lines run through the two camps.
Kimmerle, Joachim; Moskaliuk, Johannes & Cress, Ulrike Individual Learning and Collaborative Knowledge Building with Shared Digital Artifacts. Proceedings of World Academy of Science: Engineering \& Technology 2008 The development of Internet technology in recent years has led to a more active role of users in creating Web content. This has significant effects both on individual learning and collaborative knowledge building. This paper will present an integrative framework model to describe and explain learning and knowledge building with shared digital artifacts on the basis of Luhmann's systems theory and Piaget's model of equilibration. In this model, knowledge progress is based on cognitive conflicts resulting from incongruities between an individual's prior knowledge and the information which is contained in a digital artifact. Empirical support for the model will be provided by 1) applying it descriptively to texts from Wikipedia, 2) examining knowledge-building processes using a social network analysis, and 3) presenting a survey of a series of experimental laboratory studies.
Yang, Kai-Hsiang; Chen, Chun-Yu; Lee, Hahn-Ming & Ho, Jan-Ming EFS: Expert Finding System based on Wikipedia link pattern analysis IEEE International Conference on Systems, Man and Cybernetics, 2008. SMC 2008. 2008 Building an expert finding system is very important for many applications especially in the academic environment. Previous work uses e-mails or Web pages as corpus to analyze the expertise for each expert. In this paper, we present an Expert Finding System, abbreviated as EFS} to build experts' profiles by using their journal publications. For a given proposal, the EFS} first looks up the Wikipedia Web site to get relative link information, and then list and rank all associated experts by using those information. In our experiments, we use a real-world dataset which comprises of 882 people and 13,654 papers, and are categorized into 9 expertise domains. Our experimental results show that the EFS} works well on several expertise domains like LdquoArtificial} Intelligencerdquo and LdquoImage} \& Pattern Recognitionrdquo etc.
Mullins, Matt & Fizzano, Perry Treelicious: A System for Semantically Navigating Tagged Web Pages IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) 2010 Collaborative tagging has emerged as a popular and effective method for organizing and describing pages on the Web. We present Treelicious, a system that allows hierarchical navigation of tagged web pages. Our system enriches the navigational capabilities of standard tagging systems, which typically exploit only popularity and co-occurrence data. We describe a prototype that leverages the Wikipedia category structure to allow a user to semantically navigate pages from the Delicious social bookmarking service. In our system a user can perform an ordinary keyword search and browse relevant pages but is also given the ability to broaden the search to more general topics and narrow it to more specific topics. We show that Treelicious indeed provides an intuitive framework that allows for improved and effective discovery of knowledge.
Achananuparp, Palakorn; Han, Hyoil; Nasraoui, Olfa & Johnson, Roberta Semantically enhanced user modeling Proceedings of the ACM} Symposium on Applied Computing 2007 [10] Content-based implicit user modeling techniques usually employ a traditional term vector as a representation of the user's interest. However, due to the problem of dimensionality in the vector space model, a simple term vector is not a sufficient representation of the user model as it ignores the semantic relations between terms. In this paper, we present a novel method to enhance a traditional term-based user model with WordNet-based} semantic similarity techniques. To achieve this, we use word definitions and relationship hierarchies in WordNet} to perform word sense disambiguation and employ domain-specific concepts as category labels for the derived user models. We tested our method on Windows to the Universe, a public educational website covering subjects in the Earth and Space Sciences, and performed an evaluation of our semantically enhanced user models against human judgment. Our approach is distinguishable from existing work because we automatically narrow down the set of domain specific concepts from initial domain concepts obtained from Wikipedia and because we automatically create semantically enhanced user models. ""
Adafre, Sisay Fissaha; Jijkoun, Valentin & De, Rijke Fact discovery in Wikipedia IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007, Nov 2 - 5 2007 Silicon Valley, CA, United states 2007 [11] We address the task of extracting focused salient information items, relevant and important for a given topic, from a large encyclopedic resource. Specifically, for a given topic (a Wikipedia article) we identify snippets from other articles in Wikipedia that contain important information for the topic of the original article, without duplicates. We compare several methods for addressing the task, and find that a mixture of content-based, link-based, and layout-based features outperforms other methods, especially in combination with the use of so-called reference corpora that capture the key properties of entities of a common type. ""
Adafre, Sisay Fissaha; Jijkoun, Valentin & Rijke, Maarten De Link-based vs. content-based retrieval for question answering using Wikipedia 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, September 20, 2006 - September 22, 2006 Alicante, Spain 2007 We describe our participation in the WiQA} 2006 pilot on question answering using Wikipedia, with a focus on comparing linkbased vs content-based retrieval. Our system currently works for Dutch and English. Springer-Verlag} Berlin Heidelberg 2007.
Adar, Eytan; Skinner, Michael & Weld, Daniel S. Information arbitrage across multi-lingual Wikipedia 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, February 9, 2009 - February 12, 2009 Barcelona, Spain 2009 [12] The rapid globalization of Wikipedia is generating a parallel, multi-lingual corpus of unprecedented scale. Pages for the same topic in many different languages emerge both as a result of manual translation and independent development. Unfortunately, these pages may appear at different times, vary in size, scope, and quality. Furthermore, differential growth rates cause the conceptual mapping between articles in different languages to be both complex and dynamic. These disparities provide the opportunity for a powerful form of information arbitrage-leveraging articles in one or more languages to improve the content in another. Analyzing four large language domains (English, Spanish, French, and German), we present Ziggurat, an automated system for aligning Wikipedia infoboxes, creating new infoboxes as necessary, filling in missing information, and detecting discrepancies between parallel pages. Our method uses self-supervised learning and our experiments demonstrate the method's feasibility, even in the absence of dictionaries. ""
Alencar, Rafael Odon De; Jr., Clodoveu Augusto Davis & Goncalves, Marcos Andre Geographical classification of documents using evidence from Wikipedia 6th Workshop on Geographic Information Retrieval, GIR'10, February 18, 2010 - February 19, 2010 Zurich, Switzerland 2010 [13] Obtaining or approximating a geographic location for search results often motivates users to include place names and other geography-related terms in their queries. Previous work shows that queries that include geography-related terms correspond to a significant share of the users' demand. Therefore, it is important to recognize the association of documents to places in order to adequately respond to such queries. This paper describes strategies for text classification into geography-related categories, using evidence extracted from Wikipedia. We use terms that correspond to entry titles and the connections between entries in Wikipedia's graph to establish a semantic network from which classification features are generated. Results of experiments using a news data-set, classified over Brazilian states, show that such terms constitute valid evidence for the geographical classification of documents, and demonstrate the potential of this technique for text classification. ""
Amaral, Carlos; Cassan, Adan; Figueira, Helena; Martins, Andre; Mendes, Afonso; Mendes, Pedro; Pinto, Claudia & Vidal, Daniel Priberam's question answering system in QA@CLEF 2007 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [14] This paper accounts for Priberam's participation in the monolingual question answering (QA) track of CLEF} 2007. In previous participations, Priberam's QA} system obtained encouraging results both in monolingual and cross-language tasks. This year we endowed the system with syntactical processing, in order to capture the syntactic structure of the question. The main goal was to obtain a more tuned question categorisation and consequently a more precise answer extraction. Besides this, we provided our system with the ability to handle topic-related questions and to use encyclopaedic sources like Wikipedia. The paper provides a description of the improvements made in the system, followed by the discussion of the results obtained in Portuguese and Spanish monolingual runs. 2008 Springer-Verlag} Berlin Heidelberg.
Arribillaga, Esnaola Active knowledge generation by university students through cooperative learning 2008 ITI 6th International Conference on Information and Communications Technology, ICICT 2008, December 16, 2008 - December 18, 2008 Cairo, Egypt 2008 [15] Social and cultural transformations caused by the globalisation have fostered changes in current universities, institutions which, doing an intensive and responsible use of technologies,have to create a continuous improvement-based pedagogical model consisting on Communities.To} this end, we propose here the adoption of the so-called hacker ethic, which highlights the importance of collaborative, passionate, creative as well as socially-valuable work. Applying this ethic to higher education, current universities may become Net-Academy-based} Universities.Therefore, these institutions require a new digital culture that allow the transmission of hacker ethic's values and, in turn, a Net-Academy-based} learning model that enable students transform into knowledge generators. In this way, wikitechnology-based systems may help universities to achieve the transformation they need. We present here an experiment to check whether these kind of resources transmit to the students the values of the hacker ethic allowing them to become active knowledge generators. This experiment revealed the problems of such technologies with the limits of the scope of the community created and the non-so-active knowledge-generator role of the students. Against these shortcomings, we address here a Wikipedia-based methodology and discuss the possibilities of this alternative to help current universities upgrade into Net-Academy-based} universities. ""
Ashoori, Elham & Lalmas, Mounia Using topic shifts in XML retrieval at INEX 2006 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany 2007 This paper describes the retrieval approaches used by Queen Mary, University of London in the INEX} 2006 ad hoc track. In our participation, we mainly investigate element-specific smoothing method within the language modelling framework. We adjust the amount of smoothing required for each XML} element depending on its number of topic shifts to provide a focused access to XML} elements in the Wikipedia collection. We also investigate whether using non-uniform priors is beneficial for the ad hoc tasks. Springer-Verlag} Berlin Heidelberg 2007.
Auer, Soren; Bizer, Christian; Kobilarov, Georgi; Lehmann, Jens; Cyganiak, Richard & Ives, Zachary DBpedia: A nucleus for a Web of open data 6th International Semantic Web Conference, ISWC 2007 and 2nd Asian Semantic Web Conference, ASWC 2007, November 11, 2007 - November 15, 2007 Busan, Korea, Republic of 2007 [16] DBpedia} is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia} allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia} datasets, and how the resulting information is published on the Web for human- and machine-consumption. We describe some emerging applications from the DBpedia} community and show how website authors can facilitate DBpedia} content within their sites. Finally, we present the current status of interlinking DBpedia} with other open datasets on the Web and outline how DBpedia} could serve as a nucleus for an emerging Web of open data. 2008 Springer-Verlag} Berlin Heidelberg.
Augello, Agnese; Vassallo, Giorgio; Gaglio, Salvatore & Pilato, Giovanni A semantic layer on semi-structured data sources for intuitive chatbots International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2009, March 16, 2009 - March 19, 2009 Fukuoka, Japan 2009 [17] The main limits of chatbot technology are related to the building of their knowledge representation and to their rigid information retrieval and dialogue capabilities, usually based on simple pattern matching rules". The analysis of distributional properties of words in a texts corpus allows the creation of semantic spaces where represent and compare natural language elements. This space can be interpreted as a "conceptual" space where the axes represent the latent primitive concepts of the analyzed corpus. The presented work aims at exploiting the properties of a data-driven semantic/conceptual space built using semistructured data sources freely available on the web like Wikipedia. This coding is equivalent to adding into the Wikipedia graph a conceptual similarity relationship layer. The chatbot can exploit this layer in order to simulate an "intuitive" behavior attempting to retrieve semantic relations between Wikipedia resources also through associative sub-symbolic paths. """
Ayu, Media A.; Taylor, Ken & Mantoro, Teddy Active learning: Engaging students in the classroom using mobile phones 2009 IEEE Symposium on Industrial Electronics and Applications, ISIEA 2009, October 4, 2009 - October 6, 2009 Kuala Lumpur, Malaysia 2009 [18] Audience Response Systems (ARS) are used to achieve active learning in lectures and large group environments by facilitating interaction between the presenter and the audience. However, their use is discouraged by the requirement for specialist infrastructure in the lecture theatre and management of the expensive clickers they use. We improve the ARS} by removing the need for specialist infrastructure, by using mobile phones instead of clickers, and by providing a web based interface in the familiar Wikipedia style. Responders usually vote by dialing and this has been configured to be cost free in most cases. The desirability of this approach is shown by the use the demonstration system has had with 21, 000 voters voting 92, 000 times in 14, 000 surveys to date. ""
Babu, T. Lenin; Ramaiah, M. Seetha; Prabhakar, T.V. & Rambabu, D. ArchVoc - Towards an ontology for software architecture ICSE 2007 Workshops:Second Workshop on SHAring and Reusing architectural Knowledge Architecture, Rationale, and Design Intent, SHARK-ADI'07, May 20, 2007 - May 26, 2007 Minneapolis, MN, United states 2007 [19] Knowledge management of any domain requires controlled vocabularies, taxonomies, thesauri, ontologies, concept maps and other such artifacts. This paper describes an effort to identify the major concepts in software architecture that can go into such meta knowledge. The concept terms are identified through two different techniques (I) manually, through backof-the-book index of some of the major texts in Software Architecture (2) through a semi-automatic technique by parsing the Wikipedia pages. Only generic architecture knowledge is considered. Apart from identifying the important concepts of software architecture, we could also see gaps in the software architecture content in the Wikipedia. ""
Baeza-Yates, Ricardo Keynote talk: Mining the web 2.0 for improved image search 4th International Conference on Semantic and Digital Media Technologies, SAMT 2009, December 2, 2009 - December 4, 2009 Graz, Austria 2009 [20] There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show how to use these sources of evidence in Flickr, such as tags, visual annotations or clicks, which represent the the wisdom of crowds behind UGC, to improve image search. These results are the work of the multimedia retrieval team at Yahoo! Research Barcelona and they are already being used in Yahoo! image search. This work is part of a larger effort to produce a virtuous data feedback circuit based on the right combination many different technologies to leverage the Web itself. ""
Banerjee, Somnath Boosting inductive transfer for text classification using Wikipedia 6th International Conference on Machine Learning and Applications, ICMLA 2007, December 13, 2007 - December 15, 2007 Cincinnati, OH, United states 2007 [21] Inductive transfer is applying knowledge learned on one set of tasks to improve the performance of learning a new task. Inductive transfer is being applied in improving the generalization performance on a classification task using the models learned on some related tasks. In this paper, we show a method of making inductive transfer for text classification more effective using Wikipedia. We map the text documents of the different tasks to a feature space created using Wikipedia, thereby providing some background knowledge of the contents of the documents. It has been observed here that when the classifiers are built using the features generated from Wikipedia they become more effective in transferring knowledge. An evaluation on the daily classification task on the Reuters RCV1} corpus shows that our method can significantly improve the performance of inductive transfer. Our method was also able to successfully overcome a major obstacle observed in a recent work on a similar setting. ""
Baoyao, Zhou; Ping, Luo; Yuhong, Xiong & Wei, Liu Wikipedia-graph based key concept extraction towards news analysis 2009 IEEE Conference on Commerce and Enterprise Computing, CEC 2009, July 20, 2000 - July 23, 2009 Vienna, Austria 2009 [22] The well-known Wikipedia can serve as a comprehensive knowledge repository to facilitate textual content analysis, due to its abundance, high quality and well-structuring. In this paper, we propose WikiRank} - a Wikipedia-graph based ranking model, which can be used to extract key Wikipedia concepts from a document. These key concepts can be regarded as the most salient terms to represent the theme of the document. Different from other existing graph-based ranking algorithms, the concept graph used for ranking in this model is constructed by leveraging not only the co-occurrence relations within the local context of a document but also the preprocessed hyperlink-structure of Wikipedia. We have applied the proposed WikiRank} model with the Support Propagation ranking algorithm to analyze the news articles, especially for enterprise news. These promising applications include Wikipedia Concept Linking and Enterprise Concept Cloud Generation. ""
Bautin, Mikhail & Skiena, Steven Concordance-based entity-oriented search IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007, November 2, 2007 - November 5, 2007 Silicon Valley, CA, United states 2007 [23] We consider the problem of finding the relevant named entities in response to a search query over a given text corpus. Entity search can readily be used to augment conventional web search engines for a variety of applications. To assess the significance of entity search, we analyzed the AOL} dataset of 36 million web search queries with respect to two different sets of entities: namely (a) 2.3 million distinct entities extracted from a news text corpus and (b) 2.9 million Wikipedia article titles. The results clearly indicate that search engines should be aware of entities, for under various criteria of matching between 18-39\% of all web search queries can be recognized as specifically searching for entities, while 73-87\% of all queries contain entities. Our entity search engine creates a concordance document for each entity, consisting of all the sentences in the corpus containing that entity. We then index and search these documents using open-source search software. This gives a ranked list of entities as the result of search. Visit http://www.textmap.com for a demonstration of our entity search engine over a large news corpus. We evaluate our system by comparing the results of each query to the list of entities that have highest statistical juxtaposition scores with the queried entity. Juxtaposition score is a measure of how strongly two entities are related in terms of a probabilistic upper bound. The results show excellent performance, particularly over well-characterized classes of entities such as people. ""
Beigbeder, Michel Focused retrieval with proximity scoring 25th Annual ACM Symposium on Applied Computing, SAC 2010, March 22, 2010 - March 26, 2010 Sierre, Switzerland 2010 [24] We present in this paper a scoring method for information retrieval based on the proximity of the query terms in the documents. The idea of the method first is to assign to each position in the document a fuzzy proximity value depending on its closeness to the surrounding keywords. These proximity values can then be summed on any range of text - including any passage or any element - and after normalization this sum is used as the relevance score for the extent. Some experiments on the Wikipedia collection used in the INEX} 2008 evaluation campaign are presented and discussed. ""
Beigbeder, Michel; Imafouo, Amelie & Mercier, Annabelle ENSM-SE at INEX 2009: Scoring with proximity and semantic tag information 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [25] We present in this paper some experiments on the Wikipedia collection used in the INEX} 2009 evaluation campaign with an information retrieval method based on proximity. The idea of the method is to assign to each position in the document a fuzzy proximity value depending on its closeness to the surrounding keywords. These proximity values can then be summed on any range of text - including any passage or any element - and after normalization this sum is used as the relevance score for the extent. To take into account the semantic tags, we define a contextual operator which allow to consider at query time only the occurrences of terms that appear in a given semantic context. 2010 Springer-Verlag} Berlin Heidelberg.
Bekavac, Bozo & Tadic, Marko A generic method for multi word extraction from wikipedia ITI 2008 30th International Conference on Information Technology Interfaces, June 23, 2008 - June 26, 2008 Cavtat/Dubrovnik, Croatia 2008 [26] This paper presents the generic method for multiword expression extraction from Wikipedia. The method is using the properties of this specific encyclopedic genre in its {HTML} format and it relies on the intention of the authors of articles to link to other articles. The relevant links were processed by applying local regular grammars within the NooJ} development environment. We tested the method on a Croatian version of Wikipedia and we present the results obtained.
Berkner, Kathrin WikiPrints - Rendering enterprise wiki content for printing Imaging and Printing in a Web 2.0 World; and Multimedia Content Access: Algorithms and Systems IV, January 19, 2010 - January 21, 2010 San Jose, CA, United states 2010 [27] Wikis have become a tool of choice for collaborative, informative communication. In contrast to the immense Wikipedia, that serves as a reference web site and typically covers only one topic per web page, enterprise wikis are often used as project management tools and contain several closely related pages authored by members of one project. In that scenario it is useful to print closely related content for review or teaching purposes. In this paper we propose a novel technique for rendering enterprise wiki content for printing called WikiPrints, that creates a linearized version of wiki content formatted as a mixture between web layout and conventional document layout suitable for printing. Compared to existing print options for wiki content, Wikiprints automatically selects content from different wiki pages given user preferences and usage scenarios. Meta data such as content authors or time of content editing are considered. A preview of the linearized content is shown to the user and an interface for making manual formatting changes provided. 2010 Copyright SPIE} - The International Society for Optical Engineering.
Bhn, Christian & Nrvag, Kjetil Extracting named entities and synonyms from Wikipedia 24th IEEE International Conference on Advanced Information Networking and Applications, AINA2010, April 20, 2010 - April 23, 2010 Perth, WA, Australia 2010 [28] In many search domains, both contents and searches are frequently tied to named entities such as a person, a company or similar. An example of such a domain is a news archive. One challenge from an information retrieval point of view is that a single entity can have more than one way of referring to it. In this paper we describe how to use Wikipedia contents to automatically generate a dictionary of named entities and synonyms that are all referring to the same entity. This dictionary can subsequently be used to improve search quality, for example using query expansion. Through an experimental evaluation we show that with our approach, we can find named entities and their synonyms with a high degree of accuracy. ""
Bischoff, Andreas The pediaphon - Speech interface to the free wikipedia encyclopedia for mobile phones, PDA's and MP3-players DEXA 2007 18th International Workshop on Database and Expert Systems Applications, September 3, 2007 - September 7, 2007 Regensburg, Germany 2007 [29] This paper presents an approach to generate audio based learning material dynamically from Wikipedia articles for M-Learning} and ubiquitous access. It introduces the so called {'Pediaphon', an speech interface to the free Wikipedia online encyclopedia as an example application for 'microlearning'. The effective generation and the deployment of the audio data to the user via podcast or progressive download (pseudo streaming) are covered. A convenient cell phone interface to the Wikipedia content, which is usable with every mobile phone will be introduced. ""
Biuk-Aghai, Robert P. Visualizing co-authorship networks in online Wikipedia 2006 International Symposium on Communications and Information Technologies, ISCIT, October 18, 2006 - October 20, 2006 Bangkok, Thailand 2006 [30] The Wikipedia online user-contributed encyclopedia has rapidly become a highly popular and widely used online reference source. However, perceiving the complex relationships in the network of articles and other entities in Wikipedia is far from easy. We introduce the notion of using co-authorship of articles to determine relationship between articles, and present the WikiVis} information visualization system which visualizes this and other types of relationships in the Wikipedia database in {3D} graph form. A {3D} star layout and a {3D} nested cone tree layout are presented for displaying relationships between entities and between categories, respectively. A novel {3D} pinboard layout is presented for displaying search results. ""
Biuk-Aghai, Robert P.; Tang, Libby Veng-Sam; Fong, Simon & Si, Yain-Whar Wikis as digital ecosystems: An analysis based on authorship 2009 3rd IEEE International Conference on Digital Ecosystems and Technologies, DEST '09, June 1, 2009 - June 3, 2009 Istanbul, Turkey 2009 [31] Wikis, best represented by the popular and highly successful Wikipedia system, have established themselves as important components of a collaboration infrastructure. We suggest that the complex network of user-contributors In volunteer-contributed wikis constitutes a digital ecosystem that bears all the characteristics typical of such systems. This paper presents an analysis supporting this notion based on significance of authorship within the wiki. Our findings confirm the hypothesis that large volunteer-contributed wikis are digital ecosystems, and thus that the findings from the digital ecosystems research stream are applicable to this type of system. ""
Bocek, Thomas; Peric, Dalibor; Hecht, Fabio; Hausheer, David & Stiller, Burkhard Peer vote: A decentralized voting mechanism for P2P collaboration systems 3rd International Conference on Autonomous Infrastructure, Management and Security, AIMS 2009, June 30, 2009 - July 2, 2009 Enschede, Netherlands 2009 [32] Peer-to-peer (P2P) systems achieve scalability, fault tolerance, and load balancing with a low-cost infrastructure, characteristics from which collaboration systems, such as Wikipedia, can benefit. A major challenge in P2P} collaboration systems is to maintain article quality after each modification in the presence of malicious peers. A way of achieving this goal is to allow modifications to take effect only if a majority of previous editors approve the changes through voting. The absence of a central authority makes voting a challenge in P2P} systems. This paper proposes the fully decentralized voting mechanism PeerVote, which enables users to vote on modifications in articles in a P2P} collaboration system. Simulations and experiments show the scalability and robustness of PeerVote, even in the presence of malicious peers. 2009 IFIP} International Federation for Information Processing.
Bohm, Christoph; Naumann, Felix; Abedjan, Ziawasch; Fenz, Dandy; Grutze, Toni; Hefenbrock, Daniel; Pohl, Matthias & Sonnabend, David Profiling linked open data with ProLOD 2010 IEEE 26th International Conference on Data Engineering Workshops, ICDEW 2010, March 1, 2010 - March 6, 2010 Long Beach, CA, United states 2010 [33] Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD} sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset. ""
Boselli, Roberto; Cesarini, Mirko & Mezzanzanica, Mario Customer knowledge and service development, the Web 2.0 role in co-production Proceedings of World Academy of Science, Engineering and Technology 2009 The paper is concerned with relationships between SSME} and ICTs} and focuses on the role of Web 2.0 tools in the service development process. The research presented aims at exploring how collaborative technologies can support and improve service processes, highlighting customer centrality and value co-production. The core idea of the paper is the centrality of user participation and the collaborative technologies as enabling factors; Wikipedia is analyzed as an example. The result of such analysis is the identification and description of a pattern characterising specific services in which users collaborate by means of web tools with value co-producers during the service process. The pattern of collaborative co-production concerning several categories of services including knowledge based services is then discussed.
Bouma, Gosse; Kloosterman, Geert; Mur, Jori; Noord, Gertjan Van; Plas, Lonneke Van Der & Tiedemann, Jorg Question answering with joost at CLEF 2007 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [34] We describe our system for the monolingual Dutch and multilingual English to Dutch QA} tasks. We describe the preprocessing of Wikipedia, inclusion of query expansion in IR, anaphora resolution in follow-up questions, and a question classification module for the multilingual task. Our best runs achieved 25.5\% accuracy for the Dutch monolingual task, and 13.5\% accuracy for the multilingual task. 2008 Springer-Verlag} Berlin Heidelberg.
Brandes, Ulrik & Lerner, Jurgen Visual analysis of controversy in user-generated encyclopedias Houndmills, Basingstoke, Hants., RG21} {6XS, United Kingdom 2008 [35] Wikipedia is a large and rapidly growing Web-based collaborative authoring environment, where anyone on the Internet can create, modify, and delete pages about encyclopedic topics. A remarkable property of some Wikipedia pages is that they are written by up to thousands of authors who may have contradicting opinions. In this paper, we show that a visual analysis of the who revises whom-network gives deep insight into controversies. We propose a set of analysis and visualization techniques that reveal the dominant authors of a page, the roles they play, and the alters they confront. Thereby we provide tools to understand how Wikipedia authors collaborate in the presence of controversy. 2008 PalgraveMacmillan} Ltd. All rights reserved.
Bryant, Susan L.; Forte, Andrea & Bruckman, Amy Becoming Wikipedian: Transformation of participation in a collaborative online encyclopedia 2005 International ACM SIGGROUP Conference on Supporting Group Work, GROUP'05, November 6, 2005 - November 9, 2005 Sanibel Island, FL, United states 2005 [36] Traditional activities change in surprising ways when computer-mediated communication becomes a component of the activity system. In this descriptive study, we leverage two perspectives on social activity to understand the experiences of individuals who became active collaborators in Wikipedia, a prolific, cooperatively-authored online encyclopedia. Legitimate peripheral participation provides a lens for understanding participation in a community as an adaptable process that evolves over time. We use ideas from activity theory as a framework to describe our results. Finally, we describe how activity on the Wikipedia stands in striking contrast to traditional publishing and suggests a new paradigm for collaborative systems. ""
Butler, Brian; Joyce, Elisabeth & Pike, Jacqueline Don't look now, but we've created a bureaucracy: The nature and roles of policies and rules in Wikipedia 26th Annual CHI Conference on Human Factors in Computing Systems, CHI 2008, April 5, 2008 - April 10, 2008 Florence, Italy 2008 [37] Wikis are sites that support the development of emergent, collective infrastructures that are highly flexible and open, suggesting that the systems that use them will be egalitarian, free, and unstructured. Yet it is apparent that the flexible infrastructure of wikis allows the development and deployment of a wide range of structures. However, we find that the policies in Wikipedia and the systems and mechanisms that operate around them are multi-faceted. In this descriptive study, we draw on prior work on rules and policies in organizations to propose and apply a conceptual framework for understanding the natures and roles of policies in wikis. We conclude that wikis are capable of supporting a broader range of structures and activities than other collaborative platforms. Wikis allow for and, in fact, facilitate the creation of policies that serve a wide variety of functions. ""
Buzzi, Marina & Leporini, Barbara Is Wikipedia usable for the blind? W4A'08: 2008 International Cross-Disciplinary Conference on Web Accessibility, W4A, Apr 21 - 22 2008 Beijing, China 2008 [38] Today wikis are becoming increasingly widespread, and offer great benefits in a variety of collaborative environments. Therefore, to be universally valuable, wiki systems should be easy to use for anyone, regardless of ability. This paper describes obstacles that a blind user may encounter when interacting via screen reader with Wikipedia, and offers some suggestions for improving usability. ""
Buzzi, M.Claudia; Buzzi, Marina; Leporini, Barbara & Senette, Caterina Making wikipedia editing easier for the blind NordiCHI 2008: Building Bridges - 5th Nordic Conference on Human-Computer Interaction, October 20, 2008 - October 22, 2008 Lund, Sweden 2008 [39] A key feature of Web 2.0 is the possibility of sharing, creating and editing on-line content. This approach is increasingly used in learning environments to favor interaction and cooperation among students. These functions should be accessible as well as easy to use for all participants. Unfortunately accessibility and usability issues still exist for Web 2.0-based applications. For instance, Wikipedia presents many difficulties for the blind. In this paper we discuss a possible solution for simplifying the Wikipedia editing page when interacting via screen reader. Building an editing interface that conforms to W3C} ARIA} (Accessible} Rich Internet Applications) recommendations would overcome ccessibility and usability problems that prevent blind users from actively contributing to Wikipedia. ""
Byna, Surendra; Meng, Jiayuan; Raghunathan, Anand; Chakradhar, Srimat & Cadambi, Srihari Best-effort semantic document search on GPUs 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU-3, Held in cooperation with ACM ASPLOS XV, March 14, 2010 - March 14, 2010 Pittsburg, PA, United states 2010 [40] Semantic indexing is a popular technique used to access and organize large amounts of unstructured text data. We describe an optimized implementation of semantic indexing and document search on manycore GPU} platforms. We observed that a parallel implementation of semantic indexing on a 128-core Tesla C870 GPU} is only {2.4X} faster than a sequential implementation on an Intel Xeon {2.4GHz} processor. We ascribe the less than spectacular speedup to a mismatch in the workload characteristics of semantic indexing and the unique architectural features of GPUs.} Compared to the regular numerical computations that have been ported to GPUs} with great success, our semantic indexing algorithm (the recently proposed Supervised Semantic Indexing algorithm called SSI) has interesting characteristics - the amount of parallelism in each training instance is data-dependent, and each iteration involves the product of a dense matrix with a sparse vector, resulting in random memory access patterns. As a result, we observed that the baseline GPU} implementation significantly under-utilizes the hardware resources (processing elements and memory bandwidth) of the GPU} platform. However, the SSI} algorithm also demonstrates unique characteristics, which we collectively refer to as the forgiving nature" of the algorithm. These unique characteristics allow for novel optimizations that do not strive to preserve numerical equivalence of each training iteration with the sequential implementation. In particular we consider best-effort computing techniques such as dependency relaxation and computation dropping to suitably alter the workload characteristics of SSI} to leverage the unique architectural features of the GPU.} We also show that the realization of dependency relaxation and computation dropping concepts on a GPU} is quite different from how one would implement these concepts on a multicore CPU} largely due to the distinct architectural features supported by a GPU.} Our new techniques dramatically enhance the amount of parallel workload leading to much higher performance on the GPU.} By optimizing data transfers between CPU} and GPU} and by reducing GPU} kernel invocation overheads we achieve further performance gains. We evaluated our new GPU-accelerated} implementation of semantic document search on a database of over 1.8 million documents from Wikipedia. By applying our novel performance-enhancing strategies our GPU} implementation on a 128-core Tesla C870 achieved a {5.5X} acceleration as compared to a baseline parallel implementation on the same GPU.} Compared to a baseline parallel TBB} implementation on a dual-socket quad-core Intel Xeon multicore CPU} (8-cores) the enhanced GPU} implementation is {11X} faster. Compared to a parallel implementation on the same multi-core CPU} that also uses data dependency relaxation and dropping computation techniques our enhanced GPU} implementation is {5X} faster. """
Cabral, Luis Miguel; Costa, Luis Fernando & Santos, Diana What Happened to Esfinge in 2007? 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [41] Esfinge is a general domain Portuguese question answering system which uses the information available on the Web as an additional resource when searching for answers. Other external resources and tools used are a broad coverage parser, a morphological analyser, a named entity recognizer and a Web-based database of word co-occurrences. In this fourth participation in CLEF, in addition to the new challenges posed by the organization (topics and anaphors in questions and the use of Wikipedia to search and support answers), we experimented with a multiple question and multiple answer approach in QA.} 2008 Springer-Verlag} Berlin Heidelberg.
Calefato, Caterina; Vernero, Fabiana & Montanari, Roberto Wikipedia as an example of positive technology: How to promote knowledge sharing and collaboration with a persuasive tutorial 2009 2nd Conference on Human System Interactions, HSI '09, May 21, 2009 - May 23, 2009 Catania, Italy 2009 [42] This paper proposes an improved redesign of Wikipedia Tutorial following Fogg's persuasive concept. Wikipedia international project aims at being the biggest online and free encyclopedia. It can be considered a persuasive tool which tries to motivate people to collaborate for the development of a shared knowledge corpus, following a specific policy of behavior.
Chahine, C.Abi.; Chaignaud, N.; Kotowicz, J.P. & Pecuchet, J.P. Context and keyword extraction in plain text using a graph representation 4th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2008, November 30, 2008 - December 3, 2008 Bali, Indonesia 2008 [43] Document indexation is an essential task achieved by archivists or automatic indexing tools. To retrieve relevant documents to a query, keywords describing this document have to be carefully chosen. Archivists have to find out the right topic of a document before starting to extract the keywords. For an archivist indexing specialized documents, experience plays an important role. But indexing documents on different topics is much harder. This article proposes an innovative method for an indexing support system. This system takes as input an ontology and a plain text document and provides as output contextualized keywords of the document. The method has been evaluated by exploiting Wikipedia's category links as a termino-ontological resources. ""
Chandramouli, K.; Kliegr, T.; Nemrava, J.; Svatek, V. & Izquierdo, E. Query refinement and user relevance feedback for contextualized image retrieval 5th International Conference on Visual Information Engineering, VIE 2008, July 29, 2008 - August 1, 2008 Xi'an, China 2008 [44] The motivation of this paper is to enhance the user perceived precision of results of content based information retrieval (CBIR) systems with query refinement (QR), visual analysis (VA) and relevance feedback (RF) algorithms. The proposed algorithms were implemented as modules into K-Space} CBIR} system. The QR} module discovers hypernyms for the given query from a free text corpus (such as Wikipedia) and uses these hypernyms as refinements for the original query. Extracting hypernyms from Wikipedia makes it possible to apply query refinement to more queries than in related approaches that use static predefined thesaurus such as Wordnet. The VA} Module uses the K-Means} algorithm for clustering the images based on low-level MPEG} - 7 Visual features. The RF} Module uses the preference information expressed by the user to build user profiles by applying SOM-} based supervised classification, which is further optimized by a hybrid Particle Swarm Optimization (PSO) algorithm. The experiments evaluating the performance of QR} and VA} modules show promising results. 2008 The Institution of Engineering and Technology.
Chandramouli, K.; Kliegr, T.; Svatek, V. & Izquierdo, E. Towards semantic tagging in collaborative environments DSP 2009:16th International Conference on Digital Signal Processing, July 5, 2009 - July 7, 2009 Santorini, Greece 2009 [45] Tags pose an efficient and effective way of organization of resources, but they are not always available. A technique called SCM/THD} investigated in this paper extracts entities from free-text annotations, and using the Lin similarity measure over the WordNet} thesaurus classifies them into a controlled vocabulary of tags. Hypernyms extracted from Wikipedia are used to map uncommon entities to Wordnet synsets. In collaborative environments, users can assign multiple annotations to the same object hence increasing the amount of information available. Assuming that the semantics of the annotations overlap, this redundancy can be exploited to generate higher quality tags. A preliminary experiment presented in the paper evaluates the consistency and quality of tags generated from multiple annotations of the same image. The results obtained on an experimental dataset comprising of 62 annotations from four annotators show that the accuracy of a simple majority vote surpasses the average accuracy obtained through assessing the annotations individually by 18\%. A moderate-strength correlation has been found between the quality of generated tags and the consistency of annotations. ""
Chatterjee, Madhumita; Sivakumar, G. & Menezes, Bernard Dynamic policy based model for trust based access control in P2P applications 2009 IEEE International Conference on Communications, ICC 2009, June 14, 2009 - June 18, 2009 Dresden, Germany 2009 [46] Dynamic self-organizing groups like wikipedia, and f/oss have special security requirements not addressed by typical access control mechanisms. An example is the ability to collaboratively modify access control policies based on the evolution of the group and trust and behavior levels. In this paper we propose a new framework for dynamic multi-level access control policies based on trust and reputation. The framework has interesting features wherein the group can switch between policies over time, influenced by the system's state or environment. Based on the behavior and trust level of peers in the group and the current group composition, it is possible for peers to collaboratively modify policies such as join, update and job allocation. We have modeled the framework using the declarative language Prolog. We also performed some simulations to illustrate the features of our framework. ""
Chen, Jian; Shtykh, Roman Y. & Jin, Qun A web recommender system based on dynamic sampling of user information access behaviors IEEE 9th International Conference on Computer and Information Technology, CIT 2009, October 11, 2009 - October 14, 2009 Xiamen, China 2009 [47] In this study, we propose a Gradual Adaption Model for a Web recommender system. This model is used to track users' focus of interests and its transition by analyzing their information access behaviors, and recommend appropriate information. A set of concept classes are extracted from Wikipedia. The pages accessed by users are classified by the concept classes, and grouped into three terms of short, medium and long periods, and two categories of remarkable and exceptional for each concept class, which are used to describe users' focus of interests, and to establish reuse probability of each concept class in each term for each user by Full Bayesian Estimation as well. According to the reuse probability and period, the information that a user is likely to be interested in is recommended. In this paper, we propose a new approach by which short and medium periods are determined based on dynamic sampling of user information access behaviors. We further present experimental simulation results, and show the validity and effectiveness of the proposed system. ""
Chen, Qing; Shipper, Timothy & Khan, Latifur Tweets mining using Wikipedia and impurity cluster measurement 2010 IEEE International Conference on Intelligence and Security Informatics: Public Safety and Security, ISI 2010, May 23, 2010 - May 26, 2010 Vancouver, BC, Canada 2010 [48] Twitter is one of the fastest growing online social networking services. Tweets can be categorized into trends, and are related with tags and follower/following social relationships. The categorization is neither accurate nor effective due to the short length of tweet messages and noisy data corpus. In this paper, we attempt to overcome these challenges with an extended feature vector along with a semi-supervised clustering technique. In order to achieve this goal, the training set is expanded with Wikipedia topic search result, and the feature set is extended. When building the clustering model and doing the classification, impurity measurement is introduced into our classifier platform. Our experiment results show that the proposed techniques outperform other classifiers with reasonable precision and recall. ""
Chen, Scott Deeann; Monga, Vishal & Moulin, Pierre Meta-classifiers for multimodal document classification 2009 IEEE International Workshop on Multimedia Signal Processing, MMSP '09, October 5, 2009 - October 7, 2009 Rio De Janeiro, Brazil 2009 [49] This paper proposes learning algorithms for the problem of multimodal document classification. Specifically, we develop classifiers that automatically assign documents to categories by exploiting features from both text as well as image content. In particular, we use meta-classifiers that combine state-of-the-art text and image based classifiers into making joint decisions. The two meta classifiers we choose are based on support vector machines and Adaboost. Experiments on real-world databases from Wikipedia demonstrate the benefits of a joint exploitation of these modalities. ""
Chevalier, Fanny; Huot, Stephane & Fekete, Jean-Daniel WikipediaViz: Conveying article quality for casual wikipedia readers IEEE Pacific Visualization Symposium 2010, PacificVis 2010, March 2, 2010 - March 5, 2010 Taipei, Taiwan 2010 [50] As Wikipedia has become one of the most used knowledge bases worldwide, the problem of the trustworthiness of the information it disseminates becomes central. With WikipediaViz, we introduce five visual indicators integrated to the Wikipedia layout that can keep casual Wikipedia readers aware of important meta-information about the articles they read. The design of WikipediaViz} was inspired by two participatory design sessions with expert Wikipedia writers and sociologists who explained the clues they used to quickly assess the trustworthiness of articles. According to these results, we propose five metrics for Maturity and Quality assessment OfWikipedia} articles and their accompanying visualizations to provide the readers with important clues about the editing process at a glance. We also report and discuss about the results of the user studies we conducted. Two preliminary pilot studies show that all our subjects trust Wikipedia articles almost blindly. With the third study, we show that WikipediaViz} significantly reduces the time required to assess the quality of articles while maintaining a good accuracy.
Chidlovskii, Boris Multi-label wikipedia classification with textual and link features 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [51] We address the problem of categorizing a large set of linked documents with important content and structure aspects, in particular, from the Wikipedia collection proposed at the INEX} 2009 XML} Mining challenge. We analyze the network of collection pages and turn it into valuable features for the classification. We combine the content-based and link-based features of pages to train an accurate categorizer for unlabelled pages. In the multi-label setting, we revise a number of existing techniques and test some which show a good scalability. We report evaluation results obtained with a variety of learning methods and techniques on the training set of the Wikipedia corpus. 2010 Springer-Verlag} Berlin Heidelberg.
Chin, Si-Chi; Street, W. Nick; Srinivasan, Padmini & Eichmann, David Detecting wikipedia vandalism with active learning and statistical language models 4th Workshop on Information Credibility on the Web, WICOW'10, April 26, 2010 - April 30, 2010 Raleigh, NC, United states 2010 [52] This paper proposes an active learning approach using language model statistics to detect Wikipedia vandalism. Wikipedia is a popular and influential collaborative information system. The collaborative nature of authoring, as well as the high visibility of its content, have exposed Wikipedia articles to vandalism. Vandalism is defined as malicious editing intended to compromise the integrity of the content of articles. Extensive manual efforts are being made to combat vandalism and an automated approach to alleviate the laborious process is needed. This paper builds statistical language models, constructing distributions of words from the revision history of Wikipedia articles. As vandalism often involves the use of unexpected words to draw attention, the fitness (or lack thereof) of a new edit when compared with language models built from previous versions may well indicate that an edit is a vandalism instance. In addition, the paper adopts an active learning model to solve the problem of noisy and incomplete labeling of Wikipedia vandalism. The Wikipedia domain with its revision histories offers a novel context in which to explore the potential of language models in characterizing author intention. As the experimental results presented in the paper demonstrate, these models hold promise for vandalism detection. ""
Choubassi, Maha El; Nestares, Oscar; Wu, Yi; Kozintsev, Igor & Haussecker, Horst An augmented reality tourist guide on your mobile devices 16th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM 2010, October 6, 2010 - October 8, 2010 Chongqing, China 2009 [53] We present an augmented reality tourist guide on mobile devices. Many of latest mobile devices contain cameras, location, orientation and motion sensors. We demonstrate how these devices can be used to bring tourism information to users in a much more immersive manner than traditional text or maps. Our system uses a combination of camera, location and orientation sensors to augment live camera view on a device with the available information about the objects in the view. The augmenting information is obtained by matching a camera image to images in a database on a server that have geotags in the vicinity of the user location. We use a subset of geotagged English Wikipedia pages as the main source of images and augmenting text information. At the time of publication our database contained 50 K pages with more than 150 K images linked to them. A combination of motion estimation algorithms and orientation sensors is used to track objects of interest in the live camera view and place augmented information on top of them. 2010 Springer-Verlag} Berlin Heidelberg.
Ciglan, Marek; Rivierez, Etienne & Nrvag, Kjetil Learning to find interesting connections in Wikipedia 12th International Asia Pacific Web Conference, APWeb 2010, April 6, 2010 - April 8, 2010 Busan, Republic of Korea 2010 [54] To help users answer the question, what is the relation between (real world) entities or concepts, we might need to go well beyond the borders of traditional information retrieval systems. In this paper, we explore the possibility of exploiting the Wikipedia link graph as a knowledge base for finding interesting connections between two or more given concepts, described by Wikipedia articles. We use a modified Spreading Activation algorithm to identify connections between input concepts. The main challenge in our approach lies in assessing the strength of a relation defined by a link between articles. We propose two approaches for link weighting and evaluate their results with a user evaluation. Our results show a strong correlation between used weighting methods and user preferences; results indicate that the Wikipedia link graph can be used as valuable semantic resource. ""
Conde, Tiago; Marcelino, Luis & Fonseca, Benjamim Implementing a system for collaborative search of local services 14th International Workshop of Groupware, CRIWG 2008, September 14, 2008 - September 18, 2008 Omaha, NE, United states 2008 [55] The internet in the last few years has changed the way people interact with each other. In the past, users were just passive actors, consuming the information available on the web. Nowadays, their behavior is the opposite. With the so-called web 2.0, internet users became active agents and are now responsible for the creation of the content in web sites like MySpace, Wikipedia, YouTube, Yahoo! Answers and many more. Likewise, the way people buy a product or service has changed considerably. Thousands of online communities have been created on the internet, where users can share opinions and ideas about an electronic device, a medical service or a restaurant. An increasing number of consumers use this kind of online communities as information source before buying a product or service. This article describes a web system with the goal of creating an online community, where users could share their knowledge about local services, writing reviews and answering questions made by other members of the community regarding those services. The system will provide means for synchronous and asynchronous communication between users so that they can share their knowledge more easily. 2008 Springer Berlin Heidelberg.
Congle, Zhang & Dikan, Xing Knowledge-supervised learning by co-clustering based approach 7th International Conference on Machine Learning and Applications, ICMLA 2008, December 11, 2008 - December 13, 2008 San Diego, CA, United states 2008 [56] Traditional text learning algorithms need labeled documents to supervise the learning process, but labeling documents of a specific class is often expensive and time consuming. We observe it is convenient to use some keywords(i.e. class-descriptions) to describe class sometimes. However, short class-description usually does not contain enough information to guide classification. Fortunately, large amount of public data is easily acquired, i.e. ODP, Wikipedia and so on, which contains enormous knowledge. In this paper, we address the text classification problem with such knowledge rather than any labeled documents and propose a co-clustering based knowledge-supervised learning algorithm (CoCKSL) in information theoretic framework, which effectively applies the knowledge to classification tasks. ""
Cotta, Carlos Keeping the ball rolling: Teaching strategies using Wikipedia: An argument in favor of its use in computer science courses 2nd International Conference on Computer Supported Education, CSEDU 2010, April 7, 2010 - April 10, 2010 Valencia, Spain 2010 Edition of Wikipedia articles has been recently proposed as a learning assignment. I argue it ideally suits Computer Science courses, due to the intrinsic mathematical nature of the concepts and structures considered in this field. It also provides benefits in terms of autonomous research and team-working, as well as a valuable legacy for future years' students. This view is supported by a two-year experience in sophomore programming subjects in the University of Malaga, Spain.
Craswell, Nick; Demartini, Gianluca; Gaugaz, Julien & Iofciu, Tereza L3S at INEX 2008: Retrieving entities using structured information 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [57] Entity Ranking is a recently emerging search task in Information Retrieval. In Entity Ranking the goal is not finding documents matching the query words, but instead finding entities which match those requested in the query. In this paper we focus on the Wikipedia corpus, interpreting it as a set of entities and propose algorithms for finding entities based on their structured representation for three different search tasks: entity ranking, list completion, and entity relation search. The main contribution is a methodology for indexing entities using a structured representation. Our approach focuses on creating an index of facts about entities for the different search tasks. More, we use the category structure information for improving the effectiveness of the List Completion task. 2009 Springer Berlin Heidelberg.
Crouch, Carolyn J.; Crouch, Donald B.; Bapat, Salil; Mehta, Sarika & Paranjape, Darshan Finding good elements for focused retrieval 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [58] This paper describes the integration of our methodology for the dynamic retrieval of XML} elements [2] with traditional article retrieval to facilitate the Focused and the Relevant-in-Context} Tasks of the INEX} 2008 Ad Hoc Track. The particular problems that arise for dynamic element retrieval in working with text containing both tagged and untagged elements have been solved [3]. The current challenge involves utilizing its ability to produce a rank-ordered list of elements in the context of focused retrieval. Our system is based on the Vector Space Model [8]; basic functions are performed using the Smart experimental retrieval system [7]. Experimental results are reported for the Focused, Relevant-in-Context, and Best-in-Context} Tasks of both the 2007 and 2008 INEX} Ad Hoc Tracks. These results indicate that the goal of our 2008 investigations-namely, finding good focused elements in the context of the Wikipedia collection-has been achieved. 2009 Springer Berlin Heidelberg.
Crouch, Carolyn J.; Crouch, Donald B.; Bhirud, Dinesh; Poluri, Pavan; Polumetla, Chaitanya & Sudhakar, Varun A methodology for producing improved focused elements 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [59] This paper reports the results of our experiments to consistently produce highly ranked focused elements in response to the Focused Task of the INEX} Ad Hoc Track. The results of these experiments, performed using the 2008 INEX} collection, confirm that our current methodology (described herein) produces such elements for this collection. Our goal for 2009 is to apply this methodology to the new, extended 2009 INEX} collection to determine its viability in this environment. (These} experiments are currently underway.) Our system uses our method for dynamic element retrieval [4], working with the semi-structured text of Wikipedia [5], to produce a rank-ordered list of elements in the context of focused retrieval. It is based on the Vector Space Model [15]; basic functions are performed using the Smart experimental retrieval system [14]. Experimental results are reported for the Focused Task of both the 2008 and 2009 INEX} Ad Hoc Tracks. 2010 Springer-Verlag} Berlin Heidelberg.
Crouch, Carolyn J.; Crouch, Donald B.; Kamat, Nachiket; Malik, Vikram & Mone, Aditya Dynamic element retrieval in the wikipedia collection 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [60] This paper describes the successful adaptation of our methodology for the dynamic retrieval of XML} elements to a semi-structured environment. Working with text that contains both tagged and untagged elements presents particular challenges in this context. Our system is based on the Vector Space Model; basic functions are performed using the Smart experimental retrieval system. Dynamic element retrieval requires only a single indexing of the document collection at the level of the basic indexing node (i.e., the paragraph). It returns a rank-ordered list of elements identical to that produced by the same query against an all-element index of the collection. Experimental results are reported for both the 2006 and 2007 Ad-hoc tasks. 2008 Springer-Verlag} Berlin Heidelberg.
Cui, Gaoying; Lu, Qin; Li, Wenjie & Chen, Yirong Automatic acquisition of attributes for ontology construction 22nd International Conference on Computer Processing of Oriental Languages, ICCPOL 2009, March 26, 2009 - March 27, 2009 Hong kong 2009 [61] An ontology can be seen as an organized structure of concepts according to their relations. A concept is associated with a set of attributes that themselves are also concepts in the ontology. Consequently, ontology construction is the acquisition of concepts and their associated attributes through relations. Manual ontology construction is time-consuming and difficult to maintain. Corpus-based ontology construction methods must be able to distinguish concepts themselves from concept instances. In this paper, a novel and simple method is proposed for automatically identifying concept attributes through the use of Wikipedia as the corpus. The built-in Infobox in Wiki is used to acquire concept attributes and identify semantic types of the attributes. Two simple induction rules are applied to improve the performance. Experimental results show precisions of 92.5\% for attribute acquisition and 80\% for attribute type identification. This is a very promising result for automatic ontology construction. 2009 Springer Berlin Heidelberg.
Curino, Carlo A.; Moon, Hyun J.; Tanca, Letizia & Zaniolo, Carlo Schema evolution in wikipedia - Toward a web Information system benchmark ICEIS 2008 - 10th International Conference on Enterprise Information Systems, June 12, 2008 - June 16, 2008 Barcelona, Spain 2008 Evolving the database that is at the core of an Information System represents a difficult maintenance problem that has only been studied in the framework of traditional information systems. However, the problem is likely to be even more severe in web information systems, where open-source software is often developed through the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an indepth analysis of the evolution history of the Wikipedia database and its schema; Wikipedia is the best-known example of a large family of web information systems built using the open-source software MediaWiki.} Our study is based on: (i) a set of Schema Modification Operators that provide a simple conceptual representation for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of growth and evolution. Beyond confirming the initial hunch about the severity of the problem, our analysis suggests the need for developing better methods and tools to support graceful schema evolution. Therefore, we briefly discuss documentation and automation support systems for database evolution, and suggest that the Wikipedia case study can provide the kernel of a benchmark for testing and improving such systems.
Dalip, Daniel Hasan; Goncalves, Marcos Andre; Cristo, Marco & Calado, Pavel Automatic quality assessment of content created collaboratively by web communities: A case study of wikipedia 2009 ACM/IEEE Joint Conference on Digital Libraries, JCDL'09, June 15, 2009 - June 19, 2009 Austin, TX, United states 2009 [62] The old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solution and show significant improvements in terms of effective quality prediction. ""
Darwish, Kareem CMIC@INEX 2008: Link-the-wiki track 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [63] This paper describes the runs that I submitted to the INEX} 2008 Link-the-Wiki} track. I participated in the incoming File-to-File} and the outgoing Anchor-to-BEP} tasks. For the File-to-File} task I used a generic IR} engine and constructed queries based on the title, keywords, and keyphrases of the Wikipedia article. My runs performed well for this task achieving the highest precision for low recall levels. Further post-hoc experiments showed that constructing queries using titles only produced even better results than the official submissions. For the Anchor-to-BEP} task, I used a keyphrase extraction engine developed in-house and I filtered the keyphrases using existing Wikipedia titles. Unfortunately, my runs performed poorly compared to those of other groups. I suspect that this was the result of using many phrases that were not central to articles as anchors. 2009 Springer Berlin Heidelberg.
Das, Sanmay & Magdon-Ismail, Malik Collective wisdom: Information growth in wikis and blogs 11th ACM Conference on Electronic Commerce, EC'10, June 7, 2010 - June 11, 2010 Cambridge, MA, United states 2010 [64] Wikis and blogs have become enormously successful media for collaborative information creation. Articles and posts accrue information through the asynchronous editing of users who arrive both seeking information and possibly able to contribute information. Most articles stabilize to high quality, trusted sources of information representing the collective wisdom of all the users who edited the article. We propose a model for information growth which relies on two main observations: (i) as an article's quality improves, it attracts visitors at a faster rate (a rich get richer phenomenon); and, simultaneously, (ii) the chances that a new visitor will improve the article drops (there is only so much that can be said about a particular topic). Our model is able to reproduce many features of the edit dynamics observed on Wikipedia and on blogs collected from LiveJournal;} in particular, it captures the observed rise in the edit rate, followed by 1/t decay. ""
Demartini, Gianluca; Firan, Claudiu S. & Iofciu, Tereza L3S at INEX 2007: Query expansion for entity ranking using a highly accurate ontology 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [65] Entity ranking on Web scale datasets is still an open challenge. Several resources, as for example Wikipedia-based ontologies, can be used to improve the quality of the entity ranking produced by a system. In this paper we focus on the Wikipedia corpus and propose algorithms for finding entities based on query relaxation using category information. The main contribution is a methodology for expanding the user query by exploiting the semantic structure of the dataset. Our approach focuses on constructing queries using not only keywords from the topic, but also information about relevant categories. This is done leveraging on a highly accurate ontology which is matched to the character strings of the topic. The evaluation is performed using the INEX} 2007 Wikipedia collection and entity ranking topics. The results show that our approach performs effectively, especially for early precision metrics. 2008 Springer-Verlag} Berlin Heidelberg.
Demartini, Gianluca; Firan, Claudiu S.; Iofciu, Tereza; Krestel, Ralf & Nejdl, Wolfgang A model for Ranking entities and its application to Wikipedia Latin American Web Conference, LA-WEB 2008, October 28, 2008 - October 30, 2008 Vila Velha, Espirito Santo, Brazil 2008 [66] Entity Ranking (ER) is a recently emerging search task in Information Retrieval, where the goal is not finding documents matching the query words, but instead finding entities which match types and attributes mentioned in the query. In this paper we propose a formal model to define entities as well as a complete ER} system, providing examples of its application to enterprise, Web, and Wikipedia scenarios. Since searching for entities on Web scale repositories is an open challenge as the effectiveness of ranking is usually not satisfactory, we present a set of algorithms based on our model and evaluate their retrieval effectiveness. The results show that combining simple Link Analysis, Natural Language Processing, and Named Entity Recognition methods improves retrieval performance of entity search by over 53\% for P@ 10 and 35\% for MAP.} ""
Demartini, Gianluca; Iofciu, Tereza & Vries, Arjen P. De Overview of the INEX 2009 entity ranking track 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [67] In some situations search engine users would prefer to retrieve entities instead of just documents. Example queries include Italian} Nobel prize winners" {"Formula} 1 drivers that won the Monaco Grand Prix" or {"German} spoken Swiss cantons". The XML} Entity Ranking (XER) track at INEX} creates a discussion forum aimed at standardizing evaluation procedures for entity retrieval. This paper describes the XER} tasks and the evaluation procedure used at the XER} track in 2009 where a new version of Wikipedia was used as underlying collection; and summarizes the approaches adopted by the participants. 2010 Springer-Verlag} Berlin Heidelberg."
Demidova, Elena; Oelze, Irina & Fankhauser, Peter Do we mean the same? Disambiguation of extracted keyword queries for database search 1st International Workshop on Keyword Search on Structured Data, KEYS '09, June 28, 2009 - June 28, 2009 Providence, RI, United states 2009 [68] Users often try to accumulate information on a topic of interest from multiple information sources. In this case a user's informational need might be expressed in terms of an available relevant document, e.g. a web-page or an e-mail attachment, rather than a query. Database search engines are mostly adapted to the queries manually created by the users. In case a user's informational need is expressed in terms of a document, we need algorithms that map keyword queries automatically extracted from this document to the database content. In this paper we analyze the impact of selected document and database statistics on the effectiveness of keyword disambiguation for manually created as well as automatically extracted keyword queries. Our evaluation is performed using a set of user queries from the AOL} query log and a set of queries automatically extracted from Wikipedia articles both executed against the Internet Movie Database (IMDB).} Our experimental results show that (1) knowledge of the document context is crucial in order to extract meaningful keyword queries; (2) statistics which enable effective disambiguation of user queries are not sufficient to achieve the same quality for the automatically extracted requests. ""
Denoyer, Ludovic & Gallinari, Patrick Overview of the INEX 2008 XML mining track categorization and clustering of XML documents in a graph of documents 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [69] We describe here the XML} Mining Track at INEX} 2008. This track was launched for exploring two main ideas: first identifying key problems for mining semi-structured documents and new challenges of this emerging field and second studying and assessing the potential of machine learning techniques for dealing with generic Machine Learning (ML) tasks in the structured domain i.e. classification and clustering of semi structured documents. This year, the track focuses on the supervised classification and the unsupervised clustering of XML} documents using link information. We consider a corpus of about 100,000 Wikipedia pages with the associated hyperlinks. The participants have developed models using the content information, the internal structure information of the XML} documents and also the link information between documents. 2009 Springer Berlin Heidelberg.
Denoyer, Ludovic & Gallinari, Patrick Machine learning for semi-structured multimedia documents: Application to pornographic filtering and thematic categorization Machine Learning Techniques for Multimedia - Case Studies on Organization and Retrieval Tiergartenstrasse 17, Heidelberg, D-69121, Germany 2008 We propose a generative statistical model for the classification of semi-structured multimedia documents. Its main originality is its ability to simultaneously take into account the structural and the content information present in a semi-structured document and also to cope with different types of content (text, image, etc.). We then present the results obtained on two sets of experiments: • One set concerns the filtering of pornographic Web pages • The second one concerns the thematic classification of Wikipedia documents. 2008 Springer-Verlag} Berlin Heidelberg.
Deshpande, Smita & Betke, Margrit RefLink: An interface that enables people with motion impairments to analyze web content and dynamically link to references 9th International Workshop on Pattern Recognition in Information Systems - PRIS 2009 In Conjunction with ICEIS 2009, May 6, 2009 - May 7, 2009 Milan, Italy 2009 In this paper, we present RefLink, an interface that allows users to analyze the content of web page by dynamically linking to an online encyclopedia such as Wikipedia. Upon opening a webpage, RefLink instantly provides a list of terms extracted from the webpage and annotates each term by the number of its occurrences in the page. RefLink} uses the text-to-speech interface to read out the list of terms. The user can select a term of interest and follow its link to the encyclopedia. RefLink} thus helps the users to perform an informed and efficient contextual analysis. Initial user testing suggests that RefLink} is a valuable web browsing tool, in particular for people with motion impairments, because it greatly simplifies the process of obtaining reference material and performing contextual analysis.
Dopichaj, Philipp; Skusa, Andre & He, Andreas Stealing anchors to link the wiki 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [70] This paper describes the Link-the-Wiki} submission of Lycos Europe. We try to learn suitable anchor texts by looking at the anchor texts the Wikipedia authors used. Disambiguation is done by using textual similarity and also by checking whether a set of link targets makes sense" together. 2009 Springer Berlin Heidelberg."
Doyle, Richard & Devon, Richard Teaching process for technological literacy: The case of nanotechnology and global open source pedagogy 2010 ASEE Annual Conference and Exposition, June 20, 2010 - June 23, 2010 Louisville, KY, United states 2010 In this paper we propose approaching the concern addressed by the technology literacy movement by using process design rather than product design. Rather than requiring people to know an impossible amount about technology, we suggest that we can teach process for understanding and making decisions about any technology. This process can be applied to new problems and new contexts that emerge from the continuous innovation and transformation of technology markets. Such a process offers a strategy for planning for and abiding the uncertainty intrinsic to the development of modern science and technology. We teach students from diverse backgrounds in an NSF} funded course on the social, human, and ethical (SHE) impacts of nanotechnology. The process we will describe is global open source collective intelligence (GOSSIP).} This paper traces out some the principles of GOSSIP} through the example of a course taught to a mixture of engineers and students from the Arts and the Humanities. Open source is obviously a powerful method: witness the development of Linux, and GNU} before that, and the extraordinary success of Wikipedia. Democratic, and hence diverse, information flows have been suggested as vital to sustaining a healthy company. American Society for Engineering Education, 2010.
Dupen, Barry Using internet sources to solve materials homework assignments 2008 ASEE Annual Conference and Exposition, June 22, 2008 - June 24, 2008 Pittsburg, PA, United states 2008 Materials professors commonly ask homework questions derived from textbook readings, only to have students find the answers faster using internet resources such as Wikipedia or Google. While we hope students will actually read their textbooks, we can take advantage of student internet use to teach materials concepts. After graduation, these engineers will use the internet as a resource in their jobs, so it makes sense to use the internet in classroom exercises too. This paper discusses several materials homework assignments requiring internet research, and a few which require the textbook. Students learn that some answers are very difficult to find, and that accuracy is not guaranteed. Students also learn how materials data affect design, economics, and public policy. American Society for Engineering Education, 2008.
Edwards, Lilian Content filtering and the new censorship 4th International Conference on Digital Society, ICDS 2010, Includes CYBERLAWS 2010: 1st International Conference on Technical and Legal Aspects of the e-Society, February 10, 2010 - February 16, 2010 St. Maarten, Netherlands 2010 [71] Since the famous Time magazine cover of 1995, nation states have been struggling to control access to adult and illegal material on the Internet. In recent years, strategies for such control have shifted from the use of traditional policing-largely ineffective in a transnational medium - to the use of take down and especially filtering applied by ISPs} enrolled as privatized censors" by the state. The role of the IWF} in the UK} has become a pivotal case study of how state and private interests have interacted to produce effective but non transparent and non accountable censorship even in a Western democracy. The IWF's} role has recently been significantly questioned after a stand-off with Wikipedia in December 2008. This paper will set the IWF's} recent acts in the context of a massive increase in global filtering of Internet content and suggest the creation of a Speech Impact Assessment process which might inhibit the growth of unchecked censorship. """
Elmqvis, Niklas; Do, Thanh-Nghi; Goodell, Howard; Henry, Nathalie & Fekete, Jean-Daniel ZAME: Interactive large-scale graph visualization 2008 Pacific Visualization Symposium, PacificVis 2008, March 4, 2008 - March 7, 2008 Kyoto, Japan 2008 [72] We present the Zoomable Adjacency Matrix Explorer (ZAME), a visualization tool for exploring graphs at a scale of millions of nodes and edges. ZAME} is based on an adjacency matrix graph representation aggregated at multiple scales. It allows analysts to explore a graph at many levels, zooming and panning with interactive performance from an overview to the most detailed views. Several components work together in the ZAME} tool to make this possible. Efficient matrix ordering algorithms group related elements. Individual data cases are aggregated into higher-order meta-representations. Aggregates are arranged into a pyramid hierarchy that allows for on-demand paging to GPU} shader programs to support smooth multiscale browsing. Using ZAME, we are able to explore the entire French. Wikipedia - over 500,000 articles and 6,000,000 links - with interactive performance on standard consumer-level computer hardware. ""
Fachry, Khairun Nisa; Kamps, Jaap; Koolen, Marijn & Zhang, Junte Using and detecting links in Wikipedia 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [73] In this paper, we document our efforts at INEX} 2007 where we participated in the Ad Hoc Track, the Link the Wiki Track, and the Interactive Track that continued from INEX} 2006. Our main aims at INEX} 2007 were the following. For the Ad Hoc Track, we investigated the effectiveness of incorporating link evidence into the model, and of a CAS} filtering method exploiting the structural hints in the INEX} topics. For the Link the Wiki Track, we investigated the relative effectiveness of link detection based on retrieving similar documents with the Vector Space Model, and then filter with the names of Wikipedia articles to establish a link. For the Interactive Track, we took part in the interactive experiment comparing an element retrieval system with a passage retrieval system. The main results are the following. For the Ad Hoc Track, we see that link priors improve most of our runs for the Relevant in Context and Best in Context Tasks, and that CAS} pool filtering is effective for the Relevant in Context and Best in Context Tasks. For the Link the Wiki Track, the results show that detecting links with name matching works relatively well, though links were generally under-generated, which hurt the performance. For the Interactive Track, our test-persons showed a weak preference for the element retrieval system over the passage retrieval system. 2008 Springer-Verlag} Berlin Heidelberg.
Fadaei, Hakimeh & Shamsfard, Mehrnoush Extracting conceptual relations from Persian resources 7th International Conference on Information Technology - New Generations, ITNG 2010, April 12, 2010 - April 14, 2010 Las Vegas, NV, United states 2010 [74] In this paper we present a relation extraction system which uses a combination of pattern based, structure based and statistical approaches. This system uses raw texts and Wikipedia articles to learn conceptual relations. Wikipedia structures are rich source of information in relation extraction and are well used in this system. A set of patterns are extracted for Persian language and are used to learn both taxonomic and non-taxonomic relations. This system is one of the few relation extraction systems designed for Persian language and is the first system among them which uses Wikipedia structures in the process of relation learning. ""
Fernandez-Garcia, Norberto; Blazquez-Del-Toro, Jose M.; Fisteus, Jesus Arias & Sanchez-Fernandez, Luis A semantic web portal for semantic annotation and search 10th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES 2006, October 9, 2006 - October 11, 2006 Bournemouth, United kingdom 2006 The semantic annotation of the contents of Web resources is a required step in order to allow the Semantic Web vision to become a reality. In this paper we describe an approach to manual semantic annotation which tries to integrate both the semantic annotation task and the information retrieval task. Our approach exploits the information provided by Wikipedia pages and takes the form of a semantic Web portal, which allows a community of users to easily define and share annotations on Web resources. Springer-Verlag} Berlin Heidelberg 2006.
Ferrandez, Sergio; Toral, Antonio; Ferrandez, Oscar; Ferrandez, Antonio & Munoz, Rafael Applying Wikipedia's multilingual knowledge to cross-lingual question answering 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, June 27, 2007 - June 29, 2007 Paris, France 2007 The application of the multilingual knowledge encoded in Wikipedia to an open-domain Cross-Lingual} Question Answering system based on the Inter Lingual Index (ILI) module of EuroWordNet} is proposed and evaluated. This strategy overcomes the problems due to ILI's} low coverage on proper nouns (Named} Entities). Moreover, as these are open class words (highly changing), using a community-based up-to-date resource avoids the tedious maintenance of hand-coded bilingual dictionaries. A study reveals the importance to translate Named Entities in CL-QA} and the advantages of relying on Wikipedia over ILI} for doing this. Tests on questions from the Cross-Language} Evaluation Forum (CLEF) justify our approach (20\% of these are correctly answered thanks to Wikipedia's Multilingual Knowledge). Springer-Verlag} Berlin Heidelberg 2007.
Fier, Darja & Sagot, Benoit Combining multiple resources to build reliable wordnets 11th International Conference on Text, Speech and Dialogue, TSD 2008, September 8, 2008 - September 12, 2008 Brno, Czech republic 2008 [75] This paper compares automatically generated sets of synonyms in French and Slovene wordnets with respect to the resources used in the construction process. Polysemous words were disambiguated via a five-language word-alignment of the SEERA.NET} parallel corpus, a subcorpus of the JRC} Acquis. The extracted multilingual lexicon was disambiguated with the existing wordnets for these languages. On the other hand, a bilingual approach sufficed to acquire equivalents for monosemous words. Bilingual lexicons were extracted from different resources, including Wikipedia, Wiktionary and EUROVOC} thesaurus. A representative sample of the generated synsets was evaluated against the goldstandards. 2008 Springer-Verlag} Berlin Heidelberg.
Figueroa, Alejandro Surface language models for discovering temporally anchored definitions on the web: Producing chronologies as answers to definition questions 6th International Conference on Web Information Systems and Technologies, WEBIST 2010, April 7, 2010 - April 10, 2010 Valencia, Spain 2010 This work presents a data-driven definition question answering (QA) system that outputs a set of temporally anchored definitions as answers. This system builds surface language models on top of a corpus automatically acquired from Wikipedia abstracts, and ranks answer candidates in agreement with these models afterwards. Additionally, this study deals at greater length with the impact of several surface features in the ranking of temporally anchored answers.
Figueroa, Alejandro Are wikipedia resources useful for discovering answers to list questions within web snippets? 4th International Conference on Web Information Systems and Technologies, WEBIST 2008, May 4, 2008 - May 7, 2008 Funchal, Madeira, Portugal 2009 [76] This paper presents LiSnQA, a list question answering system that extracts answers to list queries from the short descriptions of web-sites returned by search engines, called web snippets. LiSnQA} mines Wikipedia resources in order to obtain valuable information that assists in the extraction of these answers. The interesting facet of LiSnQA} is, that in contrast to current systems, it does not account for lists in Wikipedia, but for its redirections, categories, sandboxes, and first definition sentences. Results show that these resources strengthen the answering process. 2009 Springer Berlin Heidelberg.
Figueroa, Alejandro Mining wikipedia for discovering multilingual definitions on the web 4th International Conference on Semantics, Knowledge, and Grid, SKG 2008, December 3, 2008 - December 5, 2008 Beijing, China 2008 [77] MI} - DfWebQA} is a multilingual definition question answering system (QAS) that extracts answers to definition queries from the short descriptions of web-sites returned by search engines, called web snippets. These answers are discriminated on the ground of lexico-syntactic regularities mined from multilingual resources supplied by Wikipedia. Results support that these regularities serve to significantly strengthen the answering process. In addition, Ml - DfWebQA} increases the robustness of multilingual definition QASs} by making use of aliases found in Wikipedia. ""
Figueroa, Alejandro Mining wikipedia resources for discovering answers to list questions in web snippets 4th International Conference on Semantics, Knowledge, and Grid, SKG 2008, December 3, 2008 - December 5, 2008 Beijing, China 2008 [78] This paper presents LiSnQA, a list question answering system that extracts answers to list queries from the short descriptions of web-sites returned by search engines, called web snippets. LiSnQA} mines Wikipedia resources in order to obtain valuable information that assists in the extraction of these answers. The interesting facet of LiSnQA} is, that in contrast to current systems, it does not account for lists in Wikipedia, but for its redirections, categories, sandboxes, and first definition sentences. Results show that these resources strengthen the answering process. ""
Figueroa, Alejandro & Atkinson, John Using dependency paths for answering definition questions on the web 5th International Conference on Web Information Systems and Technologies, WEBIST 2009, March 23, 2009 - March 26, 2009 Lisbon, Portugal 2009 This work presents a new approach to automatically answer definition questions from the Web. This approach learns n-gram language models from lexicalised dependency paths taken from abstracts provided by Wikipedia and uses context information to identify candidate descriptive sentences containing target answers. Results using a prototype of the model showed the effectiveness of lexicalised dependency paths as salient indicators for the presence of definitions in natural language texts.
Finin, Tim & Syed, Zareen Creating and exploiting a Web of semantic data 2nd International Conference on Agents and Artificial Intelligence, ICAART 2010, January 22, 2010 - January 24, 2010 Valencia, Spain 2010 Twenty years ago Tim Berners-Lee} proposed a distributed hypertext system based on standard Internet protocols. The Web that resulted fundamentally changed the ways we share information and services, both on the public Internet and within organizations. That original proposal contained the seeds of another effort that has not yet fully blossomed: a Semantic Web designed to enable computer programs to share and understand structured and semi-structured information easily. We will review the evolution of the idea and technologies to realize a Web of Data and describe how we are exploiting them to enhance information retrieval and information extraction. A key resource in our work is Wikitology, a hybrid knowledge base of structured and unstructured information extracted from Wikipedia.
Fogarolli, Angela Word sense disambiguation based on Wikipedia link structure ICSC 2009 - 2009 IEEE International Conference on Semantic Computing, September 14, 2009 - September 16, 2009 Berkeley, CA, United states 2009 [79] In this paper an approach based on Wikipedia link structure for sense disambiguation is presented and evaluated. Wikipedia is used as a reference to obtain lexicographic relationships and in combination with statistical information extraction it is possible to deduce concepts related to the terms extracted from a corpus. In addition, since the corpus covers a representation of a part of the real world the corpus itself is used as training data for choosing the sense which best fit the corpus. ""
Fogarolli, Angela & Ronchetti, Marco Domain independent semantic representation of multimedia presentations International Conference on Intelligent Networking and Collaborative Systems, INCoS 2009, November 4, 2009 - November 6, 2009 Barcelona, Spain 2009 [80] This paper describes a domain independent approach for semantically annotating and representing multimedia presentations. It uses a combination of techniques to automatically discover the content of the media and, though supervised or unsupervised methods, it can generate a RDF} description out of it. The domain independence is achieved using Wikipedia as a source of knowledge instead of domain Ontologies. The described approach can be relevant for understanding multimedia content which can be used in Information Retrieval, categorization and summarization. ""
Fogarolli, Angela & Ronchetti, Marco Discovering semantics in multimedia content using Wikipedia 11th International Conference on Business Information Systems, BIS 2008, May 5, 2008 - May 7, 2008 Innsbruck, Austria 2008 [81] Semantic-based information retrieval is an area of ongoing work. In this paper we present a solution for giving semantic support to multimedia content information retrieval in an E-Learning environment where very often a large number of multimedia objects and information sources are used in combination. Semantic support is given through intelligent use of Wikipedia in combination with statistical Information Extraction techniques. 2008 Springer Berlin Heidelberg.
Fu, Linyun; Wang, Haofen; Zhu, Haiping; Zhang, Huajie; Wang, Yang & Yu, Yong Making more wikipedians: Facilitating semantics reuse for wikipedia authoring 6th International Semantic Web Conference, ISWC 2007 and 2nd Asian Semantic Web Conference, ASWC 2007, November 11, 2007 - November 15, 2007 Busan, Korea, Republic of 2007 [82] Wikipedia, a killer application in Web 2.0, has embraced the power of collaborative editing to harness collective intelligence. It can also serve as an ideal Semantic Web data source due to its abundance, influence, high quality and well-structuring. However, the heavy burden of up-building and maintaining such an enormous and ever-growing online encyclopedic knowledge base still rests on a very small group of people. Many casual users may still feel difficulties in writing high quality Wikipedia articles. In this paper, we use RDF} graphs to model the key elements in Wikipedia authoring, and propose an integrated solution to make Wikipedia authoring easier based on RDF} graph matching, expecting making more Wikipedians. Our solution facilitates semantics reuse and provides users with: 1) a link suggestion module that suggests and auto-completes internal links between Wikipedia articles for the user; 2) a category suggestion module that helps the user place her articles in correct categories. A prototype system is implemented and experimental results show significant improvements over existing solutions to link and category suggestion tasks. The proposed enhancements can be applied to attract more contributors and relieve the burden of professional editors, thus enhancing the current Wikipedia to make it an even better Semantic Web data source. 2008 Springer-Verlag} Berlin Heidelberg.
Fukuhara, Tomohiro; Arai, Yoshiaki; Masuda, Hidetaka; Kimura, Akifumi; Yoshinaka, Takayuki; Utsuro, Takehito & Nakagawa, Hiroshi KANSHIN: A cross-lingual concern analysis system using multilingual blog articles 2008 1st International Workshop on Information-Explosion and Next Generation Search, INGS 2008, April 26, 2008 - April 27, 2008 Shenyang, China 2008 [83] An architecture of cross-lingual concern analysis (CLCA) using multilingual blog articles, and its prototype system are described. As various people who are living in various countries use the Web, cross-lingual information retrieval (CLIR) plays an important role in the next generation search. In this paper, we propose a CLCA} as one of CLIR} applications for facilitating users to find concerns of people across languages. We propose a layer architecture of CLCA, and its prototype system called KANSHIN.} The system collects Japanese, Chinese, Korean, and English blog articles, and analyzes concerns across languages. Users can find concerns from several viewpoints such as temporal, geographical, and a network of blog sites. The system also facilitates users to browse multilingual keywords using Wikipedia, and the system facilitates users to find spam blogs. An overview of the CLCA} architecture and the system are described. ""
Gang, Wang; Huajie, Zhang; Haofen, Wang & Yong, Yu Enhancing relation extraction by eliciting selectional constraint features from Wikipedia 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, June 27, 2007 - June 29, 2007 Paris, France 2007 Selectional Constraints are usually checked for detecting semantic relations. Previous work usually defined the constraints manually based on handcrafted concept taxonomy, which is time-consuming and impractical for large scale relation extraction. Further, the determination of entity type (e.g. NER) based on the taxonomy cannot achieve sufficiently high accuracy. In this paper, we propose a novel approach to extracting relation instances using the features elicited from Wikipedia, a free online encyclopedia. The features are represented as selectional constraints and further employed to enhance the extraction of relations. We conduct case studies on the validation of the extracted instances for two common relations {hasArtist(album, artist) and {hasDirector(film, director). Substantially high extraction precision (around 0.95) and validation accuracy (near 0.90) are obtained. Springer-Verlag} Berlin Heidelberg 2007.
Garza, Sara E.; Brena, Ramon F. & Ramirez, Eduardo Topic calculation and clustering: an application to wikipedia 7th Mexican International Conference on Artificial Intelligence, MICAI 2008, October 27, 2008 - October 31, 2008 Atizapan de Zaragoza, Mexico 2008 [84] Wikipedia is nowadays one of the most valuable information resources; nevertheless, its current structure, which has no formal organization, does not allow to always have a useful browsing among topics. Moreover, even though most Wikipedia pages include a See} Also " section for navigating through those articles' related Wikipedia pages the only references included here are those which authors are aware of leading to incompleteness and other irregularities. In this work a method for finding related Wikipedia articles is proposed; this method relies on a framework that clusters documents into semantically-calculated topics and selects the closest documents which could enrich the {"See} Also " section. """
Gaugaz, Julien; Zakrzewski, Jakub; Demartini, Gianluca & Nejdl, Wolfgang How to trace and revise identities 6th European Semantic Web Conference, ESWC 2009, May 31, 2009 - June 4, 2009 Heraklion, Crete, Greece 2009 [85] The Entity Name System (ENS) is a service aiming at providing globally unique URIs} for all kinds of real-world entities such as persons, locations and products, based on descriptions of such entities. Because entity descriptions available to the ENS} for deciding on entity Identity-Do} two entity descriptions refer to the same real-world entity?-are changing over time, the system has to revise its past decisions: One entity has been given two different URIs} or two entities have been attributed the same URI.} The question we have to investigate in this context is then: How do we propagate entity decision revisions to the clients which make use of the URIs} provided by the ENS?} In this paper we propose a solution which relies on labelling the IDs} with additional history information. These labels allow clients to locally detect deprecated URIs} they are using and also merge IDs} referring to the same real-world entity without needing to consult the ENS.} Making update requests to the ENS} only for the IDs} detected as deprecated considerably reduces the number of update requests, at the cost of a decrease in uniqueness quality. We investigate how much the number of update requests decreases using ID} history labelling, as well as how this impacts the uniqueness of the IDs} on the client. For the experiments we use both artificially generated entity revision histories as well as a real case study based on the revision history of the Dutch and Simple English Wikipedia. 2009 Springer Berlin Heidelberg.
Gehringer, Edward Assessing students' WIKI contributions 2008 ASEE Annual Conference and Exposition, June 22, 2008 - June 24, 2008 Pittsburg, PA, United states 2008 Perhaps inspired by the growing attention given to Wikipedia, instructors have increasingly been turning to wikis [1, 2] as an instructional collaborative space. A major advantage of a wiki is that any user can edit it at any time. In a class setting, students may be restricted in what pages they can edit, but usually each page can be edited by multiple students and/or each student can edit multiple pages. This makes assessment a challenge, since it is difficult to keep track of the contributions of each student. Several assessment strategies have been proposed. To our knowledge, this is the first attempt to compare them. We study the assessment strategies used in six North Carolina State University classes in Fall 2007, and offer ideas on how they can be improved. American Society for Engineering Education, 2008.
Geva, Shlomo GPX: Ad-Hoc queries and automated link discovery in the Wikipedia 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [86] The INEX} 2007 evaluation was based on the Wikipedia collection. In this paper we describe some modifications to the GPX} search engine and the approach taken in the Ad-hoc and the Link-the-Wiki} tracks. In earlier version of GPX} scores were recursively propagated from text containing nodes, through ancestors, all the way to the document root of the XML} tree. In this paper we describe a simplification whereby the score of each node is computed directly, doing away with the score propagation mechanism. Results indicate slightly improved performance. The GPX} search engine was used in the Link-the-Wiki} track to identify prospective incoming links to new Wikipedia pages. We also describe a simple and efficient approach to the identification of prospective outgoing links in new Wikipedia pages. We present and discuss evaluation results. 2008 Springer-Verlag} Berlin Heidelberg.
Geva, Shlomo; Kamps, Jaap; Lethonen, Miro; Schenkel, Ralf; Thom, James A. & Trotman, Andrew Overview of the INEX 2009 Ad hoc track 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [87] This paper gives an overview of the INEX} 2009 Ad Hoc Track. The main goals of the Ad Hoc Track were three-fold. The first goal was to investigate the impact of the collection scale and markup, by using a new collection that is again based on a the Wikipedia but is over 4 times larger, with longer articles and additional semantic annotations. For this reason the Ad Hoc track tasks stayed unchanged, and the Thorough Task of INEX} 2002-2006 returns. The second goal was to study the impact of more verbose queries on retrieval effectiveness, by using the available markup as structural constraints-now using both the Wikipedia's layout-based markup, as well as the enriched semantic markup-and by the use of phrases. The third goal was to compare different result granularities by allowing systems to retrieve XML} elements, ranges of XML} elements, or arbitrary passages of text. This investigates the value of the internal document structure (as provided by the XML} mark-up) for retrieving relevant information. The INEX} 2009 Ad Hoc Track featured four tasks: For the Thorough Task a ranked-list of results (elements or passages) by estimated relevance was needed. For the Focused Task a ranked-list of non-overlapping results (elements or passages) was needed. For the Relevant in Context Task non-overlapping results (elements or passages) were returned grouped by the article from which they came. For the Best in Context Task a single starting point (element start tag or passage start) for each article was needed. We discuss the setup of the track, and the results for the four tasks. 2010 Springer-Verlag} Berlin Heidelberg.
Ghinea, Gheorghita; Bygstad, Bendik & Schmitz, Christoph Multi-dimensional moderation in online communities: Experiences with three Norwegian sites 3rd International Conference on Online Communities and Social Computing, OCSC 2009. Held as Part of HCI International 2009, July 19, 2009 - July 24, 2009 San Diego, CA, United states 2009 [88] Online-communities and user contribution of content have become widespread over the last years. This has triggered new and innovative web concepts, and perhaps also changed the power balance in the society. Many large corporations have embraced this way of creating content to their sites, which has raised concerns regarding abusive content. Previous research has identified two main different types of moderation; one where the users have most of the control as in Wikipedia, and the other where the owners control everything. The media industry, in particular, are reluctant to loose the control of their content by using the member-maintained approach even if it has proven to cost less and be more efficient. This research proposes to merge these two moderation types through a concept called multidimensional moderation. To test this concept, two prototype solutions have been implemented and tested in large-scale discussion groups. The results from this study show that a combination of owner and user moderation may enhance the moderation process. 2009 Springer Berlin Heidelberg.
Giampiccolo, Danilo; Forner, Pamela; Herrera, Jesus; Penas, Anselmo; Ayache, Christelle; Forascu, Corina; Jijkoun, Valentin; Osenova, Petya; Rocha, Paulo; Sacaleanu, Bogdan & Sutcliffe, Richard Overview of the CLEF 2007 multilingual question answering track 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [89] The fifth QA} campaign at CLEF} [1], having its first edition in 2003, offered not only a main task but an Answer Validation Exercise (AVE) [2], which continued last year's pilot, and a new pilot: the Question Answering on Speech Transcripts (QAST) [3, 15]. The main task was characterized by the focus on cross-linguality, while covering as many European languages as possible. As novelty, some QA} pairs were grouped in clusters. Every cluster was characterized by a topic (not given to participants). The questions from a cluster possibly contain co-references between one of them and the others. Finally, the need for searching answers in web formats was satisfied by introducing Wikipedia as document corpus. The results and the analyses reported by the participants suggest that the introduction of Wikipedia and the topic related questions led to a drop in systems' performance. 2008 Springer-Verlag} Berlin Heidelberg.
Giuliano, Claudio; Gliozzo, Alfio Massimiliano; Gangemi, Aldo & Tymoshenko, Kateryna Acquiring thesauri from wikis by exploiting domain models and lexical substitution 7th Extended Semantic Web Conference, ESWC 2010, May 30, 2010 - June 3, 2010 Heraklion, Crete, Greece 2010 [90] Acquiring structured data from wikis is a problem of increasing interest in knowledge engineering and Semantic Web. In fact, collaboratively developed resources are growing in time, have high quality and are constantly updated. Among these problems, an area of interest is extracting thesauri from wikis. A thesaurus is a resource that lists words grouped together according to similarity of meaning, generally organized into sets of synonyms. Thesauri are useful for a large variety of applications, including information retrieval and knowledge engineering. Most information in wikis is expressed by means of natural language texts and internal links among Web pages, the so-called wikilinks. In this paper, an innovative method for inducing thesauri from Wikipedia is presented. It leverages on the Wikipedia structure to extract concepts and terms denoting them, obtaining a thesaurus that can be profitably used into applications. This method boosts sensibly precision and recall if applied to re-rank a state-of-the-art baseline approach. Finally, we discuss how to represent the extracted results in RDF/OWL, with respect to existing good practices.
Gonzalez-Cristobal, Jose-Carlos; Goni-Menoyo, Jose Miguel; Villena-Roman, Julio & Lana-Serrano, Sara MIRACLE progress in monolingual information retrieval at Ad-Hoc CLEF 2007 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [91] This paper presents the 2007 MIRACLE's} team approach to the AdHoc} Information Retrieval track. The main work carried out for this campaign has been around monolingual experiments, in the standard and in the robust tracks. The most important contributions have been the general introduction of automatic named-entities extraction and the use of wikipedia resources. For the 2007 campaign, runs were submitted for the following languages and tracks: a) Monolingual: Bulgarian, Hungarian, and Czech. b) Robust monolingual: French, English and Portuguese. 2008 Springer-Verlag} Berlin Heidelberg.
Grac, Marek Trdlo, an open source tool for building transducing dictionary 12th International Conference on Text, Speech and Dialogue, TSD 2009, September 13, 2009 - September 17, 2009 Pilsen, Czech republic 2009 [92] This paper describes the development of an open-source tool named Trdlo. Trdlo was developed as part of our effort to build a machine translation system between very close languages. These languages usually do not have available pre-processed linguistic resources or dictionaries suitable for computer processing. Bilingual dictionaries have a big impact on quality of translation. Proposed methods described in this paper attempt to extend existing dictionaries with inferable translation pairs. Our approach requires only 'cheap' resources: a list of lemmata for each language and rules for inferring words from one language to another. It is also possible to use other resources like annotated corpora or Wikipedia. Results show that this approach greatly improves effectivity of building Czech-Slovak} dictionary. 2009 Springer Berlin Heidelberg.
Granitzer, Michael; Seifert, Christin & Zechner, Mario Context based wikipedia linking 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [93] Automatically linking Wikipedia pages can be done either content based by exploiting word similarities or structure based by exploiting characteristics of the link graph. Our approach focuses on a content based strategy by detecting Wikipedia titles as link candidates and selecting the most relevant ones as links. The relevance calculation is based on the context, i.e. the surrounding text of a link candidate. Our goal was to evaluate the influence of the link-context on selecting relevant links and determining a links best-entry-point. Results show, that a whole Wikipedia page provides the best context for resolving link and that straight forward inverse document frequency based scoring of anchor texts achieves around 4\% less Mean Average Precision on the provided data set. 2009 Springer Berlin Heidelberg.
Guo, Hongzhi; Chen, Qingcai; Cui, Lei & Wang, Xiaolong An interactive semantic knowledge base unifying wikipedia and HowNet 7th International Conference on Information, Communications and Signal Processing, ICICS 2009, December 8, 2009 - December 10, 2009 Macau Fisherman's Wharf, China 2009 [94] We present an interactive, exoteric semantic knowledge base, which integrates {HowNet} and the online encyclopedia Wikipedia. The semantic knowledge base mainly builds on items, categories, attributes and relation between. In the constructing process, a mapping relationship is established from {HowNet, Wikipedia to the new knowledge base. Different from other online encyclopedias or knowledge dictionaries, the categories in the semantic knowledge base are semantically tagged, and this can be well used in semantic analysis and semantic computing. Currently the knowledge base built in this paper contains more than 200,000 items and 1,000 categories, and these are still increasing every day. ""
Gupta, Anand; Goyal, Akhil; Bindal, Aman & Gupta, Ankuj Meliorated approach for extracting Bilingual terminology from wikipedia 11th International Conference on Computer and Information Technology, ICCIT 2008, December 25, 2008 - December 27, 2008 Khulna, Bangladesh 2008 [95] With the demand of accurate and domain specific bilingual dictionaries, research in the field of automatic dictionary extraction has become popular. Due to lack of domain specific terminology in parallel corpora, extraction of bilingual terminology from Wikipedia (a corpus for knowledge extraction having a huge amount of articles, links within different languages, a dense link structure and a number of redirect pages) has taken up a new research in the field of bilingual dictionary creation. Our method not only analyzes interlanguage links along with redirect page titles and linktext titles but also filters out inaccurate translation candidates using pattern matching. Score of each translation candidate is calculated using page parameters and then setting an appropriate threshold as compared to previous approach, which was solely, based on backward links. In our experiment, we proved the advantages of our approach compared to the traditional approach. ""
Hartrumpf, Sven; Glockner, Ingo & Leveling, Johannes Coreference resolution for questions and answer merging by validation 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [96] For its fourth participation at QA@CLEF, the German question answering (QA) system InSicht} was improved for CLEF} 2007 in the following main areas: questions containing pronominal or nominal anaphors are treated by a coreference resolver; the shallow QA} methods are improved; and a specialized module is added for answer merging. Results showed a performance drop compared to last year mainly due to problems in handling the newly added Wikipedia corpus. However, dialog treatment by coreference resolution delivered very accurate results so that follow-up questions can be handled similarly to isolated questions. 2008 Springer-Verlag} Berlin Heidelberg.
Haruechaiyasak, Choochart & Damrongrat, Chaianun Article recommendation based on a topic model for Wikipedia Selection for Schools 11th International Conference on Asian Digital Libraries, ICADL 2008, December 2, 2008 - December 5, 2008 Bali, Indonesia 2008 [97] The 2007 Wikipedia Selection for Schools is a collection of 4,625 selected articles from Wikipedia as educational for children. Users can currently access articles within the collection via two different methods: (1) by browsing on either a subject index or a title index sorted alphabetically, and (2) by following hyperlinks embedded within article pages. These two retrieval methods are considered static and subjected to human editors. In this paper, we apply the Latent Dirichlet Allocation (LDA) algorithm to generate a topic model from articles in the collection. Each article can be expressed by a probability distribution on the topic model. We can recommend related articles by calculating the similarity measures among the articles' topic distribution profiles. Our initial experimental results showed that the proposed approach could generate many highly relevant articles, some of which are not covered by the hyperlinks in a given article. 2008 Springer Berlin Heidelberg.
Hatcher-Gallop, Rolanda; Fazal, Zohra & Oluseyi, Maya Quest for excellence in a wiki-based world 2009 IEEE International Professional Communication Conference, IPCC 2009, July 19, 2009 - July 22, 2009 Waikiki, {HI, United states 2009 [98] In an increasingly technological world, the Internet is often the primary source of information. Traditional encyclopedias, once the cornerstone of any worthy reference collection, have been replaced by online encyclopedias, many of which utilize open source software (OSS) to create and update content. One of the most popular and successful encyclopedias of this nature is Wikipedia. In fact, Wikipedia is among the most popular Internet sites in the world. However, it is not without criticism. What are some features of Wikipedia? What are some of its strengths and weaknesses? And what have other wiki-based encyclopedias learned from Wikipedia that they have incorporated into their own websites in a quest for excellence? This paper answers these questions and uses Crawford's six information quality dimensions, 1) scope; 2) format; 3) uniqueness and authority; 4) accuracy; 5) currency; and 6) accessibility, to evaluate Wikipedia and three other online encyclopedias: Citizendium, Scholarpedia, and Medpedia. The latter three have managed to maintain the advantages of Wikipedia while minimizing its weaknesses. ""
He, Jiyin Link detection with wikipedia 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [99] This paper describes our participation in the INEX} 2008 Link the Wiki track. We focused on the file-to-file task and submitted three runs, which were designed to compare the impact of different features on link generation. For outgoing links, we introduce the anchor likelihood ratio as an indicator for anchor detection, and explore two types of evidence for target identification, namely, the title field evidence and the topic article content evidence. We find that the anchor likelihood ratio is a useful indicator for anchor detection, and that in addition to the title field evidence, re-ranking with the topic article content evidence is effective for improving target identification. For incoming links, we use exact match and retrieval method with language modeling approach, and find that the exact match approach works best. On top of that, our experiment shows that the semantic relatedness between Wikipedia articles also has certain ability to indicate links. 2009 Springer Berlin Heidelberg.
He, Jiyin & Rijke, Maarten De An exploration of learning to link with wikipedia: Features, methods and training collection 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [100] We describe our participation in the Link-the-Wiki} track at INEX} 2009. We apply machine learning methods to the anchor-to-best-entry-point task and explore the impact of the following aspects of our approaches: features, learning methods as well as the collection used for training the models. We find that a learning to rank-based approach and a binary classification approach do not differ a lot. The new Wikipedia collection which is of larger size and which has more links than the collection previously used, provides better training material for learning our models. In addition, a heuristic run which combines the two intuitively most useful features outperforms machine learning based runs, which suggests that a further analysis and selection of features is necessary. 2010 Springer-Verlag} Berlin Heidelberg.
He, Jiyin; Zhang, Xu; Weerkamp, Wouter & Larson, Martha Metadata and multilinguality in video classification 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [101] The VideoCLEF} 2008 Vid2RSS} task involves the assignment of thematic category labels to dual language (Dutch/English) television episode videos. The University of Amsterdam chose to focus on exploiting archival metadata and speech transcripts generated by both Dutch and English speech recognizers. A Support Vector Machine (SVM) classifier was trained on training data collected from Wikipedia. The results provide evidence that combining archival metadata with speech transcripts can improve classification performance, but that adding speech transcripts in an additional language does not yield performance gains. 2009 Springer Berlin Heidelberg.
He, Miao; Cutler, Michal & Wu, Kelvin Categorizing queries by topic directory 9th International Conference on Web-Age Information Management, WAIM 2008, July 20, 2008 - July 22, 2008 Zhangjiajie, China 2008 [102] The categorization of a web user query by topic or category can be used to select useful web sources that contain the required information. In pursuit of this goal, we explore methods for mapping user queries to category hierarchies under which deep web resources are also assumed to be classified. Our sources for these category hierarchies, or directories, are Yahoo! Directory and Wikipedia. Forwarding an unrefined query (in our case a typical fact finding query sent to a question answering system) directly to these directory resources usually returns no directories or incorrect ones. Instead, we develop techniques to generate more specific directory finding queries from an unrefined query and use these to retrieve better directories. Despite these engineered queries, our two resources often return multiple directories that include many incorrect results, i.e., directories whose categories are not related to the query, and thus web resources for these categories are unlikely to contain the required information. We develop methods for selecting the most useful ones. We consider a directory to be useful if web sources for any of its narrow categories are likely to contain the searched for information. We evaluate our mapping system on a set of 250 TREC} questions and obtain precision and recall in the 0.8 to 1.0 range. ""
Hecht, Brent & Gergle, Darren The tower of Babel meets web 2.0: User-generated content and its applications in a multilingual context 28th Annual CHI Conference on Human Factors in Computing Systems, CHI 2010, April 10, 2010 - April 15, 2010 Atlanta, GA, United states 2010 [103] This study explores language's fragmenting effect on user-generated content by examining the diversity of knowledge representations across 25 different Wikipedia language editions. This diversity is measured at two levels: the concepts that are included in each edition and the ways in which these concepts are described. We demonstrate that the diversity present is greater than has been presumed in the literature and has a significant influence on applications that use Wikipedia as a source of world knowledge. We close by explicating how knowledge diversity can be beneficially leveraged to create culturally- aware applications" and "hyperlingual applications". """
Hecht, Brent & Moxley, Emily Terabytes of tobler: Evaluating the first law in a massive, domain-neutral representation of world knowledge 9th International Conference on Spatial Information Theory, COSIT 2009, September 21, 2009 - September 25, 2009 Aber Wrac'h, France 2009 [104] The First Law of Geography states, everything is related to everything else but near things are more related than distant things." Despite the fact that it is to a large degree what makes "spatial special the law has never been empirically evaluated on a large, domain-neutral representation of world knowledge. We address the gap in the literature about this critical idea by statistically examining the multitude of entities and relations between entities present across 22 different language editions of Wikipedia. We find that, at least according to the myriad authors of Wikipedia, the First Law is true to an overwhelming extent regardless of language-defined cultural domain. 2009 Springer Berlin Heidelberg.
Heiskanen, Tero; Kokkonen, Juhana; Hintikka, Kari A.; Kola, Petri; Hintsa, Timo & Nakki, Pirjo Tutkimusparvi the open research swarm in Finland 12th International MindTrek Conference: Entertainment and Media in the Ubiquitous Era, MindTrek'08, October 7, 2008 - October 9, 2008 Tampere, Finland 2008 [105] in this paper, we introduce a new kind of scientific collaboration type (open research swarm) and describe a realization (Tutkimusparvi) of this new type of scientific social network. Swarming is an experiment in selforganizing and a novel way to collaborate in the field of academic research. Open research swarms utilize the possibilities of Internet, especially the social media tools that are now available because of the web 2.0 boom. The main goal is to collectively attain rapid solutions to given challenges and to develop a distributed intellectual milieu for researchers. Transparency of the research and creative collaboration are central ideas behind open research swarms. Like Wikipedia, open research swarm is open for everyone to participate. The questions and research topics can come from open research swarm participants, from a purposed principal or from general discussions on the mass media. ""
Hoffman, Joel Employee knowledge: Instantly searchable Digital Energy Conference and Exhibition 2009, April 7, 2009 - April 8, 2009 Houston, TX, United states 2009 The online encyclopedia, Wikipedia, has proven the value of the world community contributing to an instantly searchable world knowledge base. The same technology can be applied to the company community: each individual sharing strategic tips directly related to company interests that are then instantly searchable. Each employee can share, using Microsoft Sharepoint Wiki Pages, those unique hints, tips, tricks, and knowledge that they feel could be of the highest value to other employees: how-to's and shortcuts in company software packages, learnings from pilot projects (successful or not), links to fantastic resources, etc. This growing knowledge base then becomes an instantly searchable, global resource for the entire company. Occidental of Elk Hills, Inc. just recently, October 15, 2008, started a rollout of Wiki page use at its Elk Hills, CA, USA} properties. There are over 300 employees at Elk Hills and its Wiki Home Page received over 1500 hits in its first day, with multiple employees contributing multiple articles. Employees are already talking about time-savers they have learned and applied. A second presentation was demanded by those that missed the first. The rollout has generated a buzz of excitement and interest that we will be encouraging into the indefinite future. The significance of a corporate knowledge base can be major: high-tech professionals not spending hours figuring out how to do what someone else has already figured out and documented, support personnel not having to answer the same questions over and over again but having only to point those asking to steps already documented, employees learning time-saving tips that they may never have learned or thought of, professionals no longer wasting time searching for results of other trials or having to reinvent the wheel. Time is money. Knowledge is power. Applying Wiki technology to corporate knowledge returns time and knowledge to the workforce leading to bottom line benefits and powerful corporate growth. 2009, Society of Petroleum Engineers.
Hong, Richang; Tang, Jinhui; Zha, Zheng-Jun; Luo, Zhiping & Chua, Tat-Seng Mediapedia: Mining web knowledge to construct multimedia encyclopedia 16th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM 2010, October 6, 2010 - October 8, 2010 Chongqing, China 2009 [106] In recent years, we have witnessed the blooming of Web 2.0 content such as Wikipedia, Flickr and YouTube, etc. How might we benefit from such rich media resources available on the internet? This paper presents a novel concept called Mediapedia, a dynamic multimedia encyclopedia that takes advantage of, and in fact is built from the text and image resources on the Web. The Mediapedia distinguishes itself from the traditional encyclopedia in four main ways. (1) It tries to present users with multimedia contents (e.g., text, image, video) which we believed are more intuitive and informative to users. (2) It is fully automated because it downloads the media contents as well as the corresponding textual descriptions from the Web and assembles them for presentation. (3) It is dynamic as it will use the latest multimedia content to compose the answer. This is not true for the traditional encyclopedia. (4) The design of Mediapedia is flexible and extensible such that we can easily incorporate new kinds of mediums such as video and languages into the framework. The effectiveness of Mediapedia is demonstrated and two potential applications are described in this paper. 2010 Springer-Verlag} Berlin Heidelberg.
Hori, Kentaro; Oishi, Tetsuya; Mine, Tsunenori; Hasegawa, Ryuzo; Fujita, Hiroshi & Koshimura, Miyuki Related word extraction from wikipedia for web retrieval assistance 2nd International Conference on Agents and Artificial Intelligence, ICAART 2010, January 22, 2010 - January 24, 2010 Valencia, Spain 2010 This paper proposes a web retrieval system with extended queries generated from the contents of Wikipedia.By} using the extended queries, we aim to assist user in retrieving Web pages and acquiring knowledge. To extract extended query items, we make much of hyperlinks in Wikipedia in addition to the related word extraction algorithm. We evaluated the system through experimental use of it by several examinees and the questionnaires to them. Experimental results show that our system works well for user's retrieval and knowledge acquisition.
Huang, Darren Wei Che; Xu, Yue; Trotman, Andrew & Geva, Shlomo Overview of INEX 2007 link the Wiki track 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [107] Wikipedia is becoming ever more popular. Linking between documents is typically provided in similar environments in order to achieve collaborative knowledge sharing. However, this functionality in Wikipedia is not integrated into the document creation process and the quality of automatically generated links has never been quantified. The Link the Wiki (LTW) track at INEX} in 2007 aimed at producing a standard procedure, metrics and a discussion forum for the evaluation of link discovery. The tasks offered by the LTW} track as well as its evaluation present considerable research challenges. This paper briefly described the LTW} task and the procedure of evaluation used at LTW} track in 2007. Automated link discovery methods used by participants are outlined. An overview of the evaluation results is concisely presented and further experiments are reported. 2008 Springer-Verlag} Berlin Heidelberg.
Huang, Jin-Xia; Ryu, Pum-Mo & Choi, Key-Sun An empirical research on extracting relations from Wikipedia text 9th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2008, November 2, 2008 - November 5, 2008 Daejeon, Korea, Republic of 2008 [108] A feature based relation classification approach is presented, in which probabilistic and semantic relatedness features between patterns and relation types are employed with other linguistic information. The importance of each feature set is evaluated with Chi-square estimator, and the experiments show that, the relatedness features have big impact on the relation classification performance. A series experiments are also performed to evaluate the different machine learning approaches on relation classification, among which Bayesian outperformed other approaches including Support Vector Machine (SVM).} 2008 Springer Berlin Heidelberg.
Huynh, Dat T.; Cao, Tru H.; Pham, Phuong H.T. & Hoang, Toan N. Using hyperlink texts to improve quality of identifying document topics based on Wikipedia 1st International Conference on Knowledge and Systems Engineering, KSE 2009, October 13, 2009 - October 17, 2009 Hanoi, Viet nam 2009 [109] This paper presents a method to identify the topics of documents based on Wikipedia category network. It is to improve the method previously proposed by Schonhofen by taking into account the weights of words in hyperlink texts in Wikipedia articles. The experiments on Computing and Team Sport domains have been carried out and showed that our proposed method outperforms the Schonhofen's one. ""
Iftene, Adrian; Pistol, Ionut & Trandabat, Diana Grammar-based automatic extraction of definitions 2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2008, September 26, 2008 - September 29, 2008 Timisoara, Romania 2008 [110] The paper describes the development and usage of a grammar developed to extract definitions from documents. One of the most important practical usages of the developed grammar is the automatic extraction of definitions from web documents. Three evaluation scenarios were run, the results of these experiments being the main focus of the paper. One scenario uses an e-learning context and previously annotated elearning documents; the second one involves a large collection of unannotated documents (from Wikipedia) and tries to find answers for definition type questions. The third scenario performs a similar question-answering task, but this time on the entire web using Google web search and the Google Translation Service. The results are convincing, further development as well as further integration of the definition extraction system in various related applications are already under way. ""
IV, Adam C. Powell & Morris, Arthur E. Wikipedia in materials education 136th TMS Annual Meeting, 2007, Febrary 25, 2007 - March 1, 2007 Orlando, FL, United states 2007 Wikipedia has become a vast storehouse of human knowledge, and a first point of reference for millions of people from all walks of life, including many materials science and engineering (MSE) students. Its characteristics of open authorship and instant publication lead to both its main strength of broad, timely coverage and also its weakness of non-uniform quality. This talk will discuss the status and potential of this medium as a delivery mechanism for materials education content, some experiences with its use in the classroom, and its fit with other media from textbooks to digital libraries.
Jack, Hugh Using a wiki for professional communication and collaboration 2009 ASEE Annual Conference and Exposition, June 14, 2009 - June 17, 2009 Austin, TX, United states 2009 Since the inception of Wikipedia there has been a great interest in the open model of document development. However this model is not that different from what already exists in many professional groups. In a professional group every member is welcome to contribute, but one individual is tasked with the secretarial duties of collecting, collating and recording communications, or capturing discourse during face-to-face meetings. These are often captured as minutes, letters, reports, and recommendations. These activities can be supported in a more free-flowing manner on a Wiki where anybody is welcome to add/modify/delete content, changes can be tracked, and undone when necessary. This paper will describe the use of a Wiki to act as a central point for a professional group developing new curriculum standards. The topics will include a prototype structure for the site, governing principles, encouraging user involvement, and resolving differences of opinion. American Society for Engineering Education, 2009.
Jamsen, Janne; Nappila, Turkka & Arvola, Paavo Entity ranking based on category expansion 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [111] This paper introduces category and link expansion strategies for the XML} Entity Ranking track at INEX} 2007. Category expansion is a coefficient propagation method for the Wikipedia category hierarchy based on given categories or categories derived from sample entities. Link expansion utilizes links between Wikipedia articles. The strategies are evaluated within the entity ranking and list completion tasks. 2008 Springer-Verlag} Berlin Heidelberg.
Janik, Maciej & Kochut, Krys J. Wikipedia in action: Ontological knowledge in text categorization 2nd Annual IEEE International Conference on Semantic Computing, ICSC 2008, August 4, 2008 - August 7, 2008 Santa Clara, CA, United states 2008 [112] We present a new, ontology-based approach to the automatic text categorization. An important and novel aspect of this approach is that our categorization method does not require a training set, which is in contrast to the traditional statistical and probabilistic methods. In the presented method, the ontology, including the domain concepts organized into hierarchies of categories and interconnected by relationships, as well as instances and connections among them, effectively becomes the classifier. Our method focuses on (i) converting a text document into a thematic graph of entities occurring in the document, (ii) ontological classification of the entities in the graph, and (iii) determining the overall categorization of the thematic graph, and as a result, the document itself. In the presented experiments, we used an RDF} ontology constructed from the full English version of Wikipedia. Our experiments, conducted on corpora of Reuters news articles, showed that our training-less categorization method achieved a very good overall accuracy. ""
Javanmardi, Sara; Ganjisaffar, Yasser; Lopes, Cristina & Baldi, Pierre User contribution and trust in Wikipedia 2009 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2009, November 11, 2009 - November 14, 2009 Washington, DC, United states 2009 [113] Wikipedia, one of the top ten most visited websites, is commonly viewed as the largest online reference for encyclopedic knowledge. Because of its open editing model -allowing anyone to enter and edit content- Wikipedia's overall quality has often been questioned as a source of reliable information. Lack of study of the open editing model of Wikipedia and its effectiveness has resulted in a new generation of wikis that restrict contributions to registered users only, using their real names. In this paper, we present an empirical study of user contributions to Wikipedia. We statistically analyze contributions by both anonymous and registered users. The results show that submissions of anonymous and registered users in Wikipedia suggest a power law behavior. About 80\% of the revisions are submitted by less than 7\% of the users, most of whom are registered users. To further refine the analyzes, we use the Wiki Trust Model (WTM), a user reputation model developed in our previous work to assign a reputation value to each user. As expected, the results show that registered users contribute higher quality content and therefore are assigned higher reputation values. However, a significant number of anonymous users also contribute high-quality content. We provide further evidence that regardless of a user s' attribution, registered or anonymous, high reputation users are the dominant contributors that actively edit Wikipedia articles in order to remove vandalism or poor quality content.
Jenkinson, Dylan & Trotman, Andrew Wikipedia ad hoc passage retrieval and Wikipedia document linking 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [114] Ad hoc passage retrieval within the Wikipedia is examined in the context of INEX} 2007. An analysis of the INEX} 2006 assessments suggests that fixed sized window of about 300 terms is consistently seen and that this might be a good retrieval strategy. In runs submitted to INEX, potentially relevant documents were identified using BM25} (trained on INEX} 2006 data). For each potentially relevant document the location of every search term was identified and the center (mean) located. A fixed sized window was then centered on this location. A method of removing outliers was examined in which all terms occurring outside one standard deviation of the center were considered outliers and the center recomputed without them. Both techniques were examined with and without stemming. For Wikipedia linking we identified terms within the document that were over-represented and from the top few generated queries of different lengths. A BM25} ranking search engine was used to identify potentially relevant documents. Links from the source document to the potentially relevant documents (and back) were constructed (at a granularity of whole document). The best performing run used the 4 most over-represented search terms to retrieve 200 documents, and the next 4 to retrieve 50 more. 2008 Springer-Verlag} Berlin Heidelberg.
Jiang, Jiepu; Lu, Wei; Rong, Xianqian & Gao, Yangyan Adapting language modeling methods for expert search to rank wikipedia entities 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [115] In this paper, we propose two methods to adapt language modeling methods for expert search to the INEX} entity ranking task. In our experiments, we notice that language modeling methods for expert search, if directly applied to the INEX} entity ranking task, cannot effectively distinguish entity types. Thus, our proposed methods aim at resolving this problem. First, we propose a method to take into account the INEX} category query field. Second, we use an interpolation of two language models to rank entities, which can solely work on the text query. Our experiments indicate that both methods can effectively adapt language modeling methods for expert search to the INEX} entity ranking task. 2009 Springer Berlin Heidelberg.
Jijkoun, Valentin; Hofmann, Katja; Ahn, David; Khalid, Mahboob Alam; Rantwijk, Joris Van; Rijke, Maarten De & Sang, Erik Tjong Kim The university of amsterdam's question answering system at QA@CLEF 2007 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [116] We describe a new version of our question answering system, which was applied to the questions of the 2007 CLEF} Question Answering Dutch monolingual task. This year, we made three major modifications to the system: (1) we added the contents of Wikipedia to the document collection and the answer tables; (2) we completely rewrote the module interface code in Java; and (3) we included a new table stream which returned answer candidates based on information which was learned from question-answer pairs. Unfortunately, the changes did not lead to improved performance. Unsolved technical problems at the time of the deadline have led to missing justifications for a large number of answers in our submission. Our single run obtained an accuracy of only 8\% with an additional 12\% of unsupported answers (compared to 21\% in the last year's task). 2008 Springer-Verlag} Berlin Heidelberg.
Jinpan, Liu; Liang, He; Xin, Lin; Mingmin, Xu & Wei, Lu A new method to compute the word relevance in news corpus 2nd International Workshop on Intelligent Systems and Applications, ISA2010, May 22, 2010 - May 23, 2010 Wuhan, China 2010 [117] In this paper we propose a new method to compute the relevance of term in news corpus. According to the characteristics of news corpus , we first propose that the news corpus should be divided into different channels, second we make use of the feature of news document , we divide the co-occurrence of terms into two cases, on the one hand the co-occurrence in the title of the news, On the other hand the co-occurrence in the news text, we use different methods to compute the co-occurrence. In the end, we introduce the web corpus Wikipedia to overcome some shortcomings of the news corpus ""
Juffinger, Andreas; Kern, Roman & Granitzer, Michael Crosslanguage Retrieval Based on Wikipedia Statistics 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [118] In this paper we present the methodology, implementations and evaluation results of the crosslanguage retrieval system we have developed for the Robust WSD} Task at CLEF} 2008. Our system is based on query preprocessing for translation and homogenisation of queries. The presented preprocessing of queries includes two stages: Firstly, a query translation step based on term statistics of cooccuring articles in Wikipedia. Secondly, different disjunct query composition techniques to search in the CLEF} corpus. We apply the same preprocessing steps for the monolingual as well as the crosslingual task and thereby acting fair and in a similar way across these tasks. The evaluation revealed that the similar processing comes at nearly no costs for monolingual retrieval but enables us to do crosslanguage retrieval and also a feasible comparison of our system performance on these two tasks. 2009 Springer Berlin Heidelberg.
Kaiser, Fabian; Schwarz, Holger & Jakob, Mihaly Using wikipedia-based conceptual contexts to calculate document similarity 3rd International Conference on Digital Society, ICDS 2009, February 1, 2009 - February 7, 2009 Cancun, Mexico 2009 [119] Rating the similarity of two or more text documents is an essential task in information retrieval. For example, document similarity can be used to rank search engine results, cluster documents according to topics etc. A major challenge in calculating document similarity originates from the fact that two documents can have the same topic or even mean the same, while they use different wording to describe the content. A sophisticated algorithm therefore will not directly operate on the texts but will have to find a more abstract representation that captures the texts' meaning. In this paper, we propose a novel approach for calculating the similarity of text documents. It builds on conceptual contexts that are derived from content and structure of the Wikipedia hypertext corpus. ""
Kamps, Jaap; Geva, Shlomo; Trotman, Andrew; Woodley, Alan & Koolen, Marijn Overview of the INEX 2008 Ad hoc track 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [120] This paper gives an overview of the INEX} 2008 Ad Hoc Track. The main goals of the Ad Hoc Track were two-fold. The first goal was to investigate the value of the internal document structure (as provided by the XML} mark-up) for retrieving relevant information. This is a continuation of INEX} 2007 and, for this reason, the retrieval results are liberalized to arbitrary passages and measures were chosen to fairly compare systems retrieving elements, ranges of elements, and arbitrary passages. The second goal was to compare focused retrieval to article retrieval more directly than in earlier years. For this reason, standard document retrieval rankings have been derived from all runs, and evaluated with standard measures. In addition, a set of queries targeting Wikipedia have been derived from a proxy log, and the runs are also evaluated against the clicked Wikipedia pages. The INEX} 2008 Ad Hoc Track featured three tasks: For the Focused Task a ranked-list of non-overlapping results (elements or passages) was needed. For the Relevant in Context Task non-overlapping results (elements or passages) were returned grouped by the article from which they came. For the Best in Context Task a single starting point (element start tag or passage start) for each article was needed. We discuss the results for the three tasks, and examine the relative effectiveness of element and passage retrieval. This is examined in the context of content only (CO, or Keyword) search as well as content and structure (CAS, or structured) search. Finally, we look at the ability of focused retrieval techniques to rank articles, using standard document retrieval techniques, both against the judged topics as well as against queries and clicks from a proxy log. 2009 Springer Berlin Heidelberg.
Kamps, Jaap & Koolen, Marijn The impact of document level ranking on focused retrieval 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [121] Document retrieval techniques have proven to be competitive methods in the evaluation of focused retrieval. Although focused approaches such as XML} element retrieval and passage retrieval allow for locating the relevant text within a document, using the larger context of the whole document often leads to superior document level ranking. In this paper we investigate the impact of using the document retrieval ranking in two collections used in the INEX} 2008 Ad hoc and Book Tracks; the relatively short documents of the Wikipedia collection and the much longer books in the Book Track collection. We experiment with several methods of combining document and element retrieval approaches. Our findings are that 1) we can get the best of both worlds and improve upon both individual retrieval strategies by retaining the document ranking of the document retrieval approach and replacing the documents by the retrieved elements of the element retrieval approach, and 2) using document level ranking has a positive impact on focused retrieval in Wikipedia, but has more impact on the much longer books in the Book Track collection. 2009 Springer Berlin Heidelberg.
Kamps, Jaap & Koolen, Marijn Is Wikipedia link structure different? 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, February 9, 2009 - February 12, 2009 Barcelona, Spain 2009 [122] In this paper, we investigate the difference between Wikipe-dia and Web link structure with respect to their value as indicators of the relevance of a page for a given topic of request. Our experimental evidence is from two IR} test-collections: the {.GOV} collection used at the TREC} Web tracks and the Wikipedia XML} Corpus used at INEX.} We first perform a comparative analysis of Wikipedia and {.GOV} link structure and then investigate the value of link evidence for improving search on Wikipedia and on the {.GOV} domain. Our main findings are: First, Wikipedia link structure is similar to the Web, but more densely linked. Second, Wikipedia's outlinks behave similar to inlinks and both are good indicators of relevance, whereas on the Web the inlinks are more important. Third, when incorporating link evidence in the retrieval model, for Wikipedia the global link evidence fails and we have to take the local context into account. ""
Kanhabua, Nattiya & Nrvag, Kjetil Exploiting time-based synonyms in searching document archives 10th Annual Joint Conference on Digital Libraries, JCDL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia 2010 [123] Query expansion of named entities can be employed in order to increase the retrieval effectiveness. A peculiarity of named entities compared to other vocabulary terms is that they are very dynamic in appearance, and synonym relationships between terms change with time. In this paper, we present an approach to extracting synonyms of named entities over time from the whole history of Wikipedia. In addition, we will use their temporal patterns as a feature in ranking and classifying them into two types, i.e., time-independent or time-dependent. Time-independent synonyms are invariant to time, while time-dependent synonyms are relevant to a particular time period, i.e., the synonym relationships change over time. Further, we describe how to make use of both types of synonyms to increase the retrieval effectiveness, i.e., query expansion with time-independent synonyms for an ordinary search, and query expansion with time-dependent synonyms for a search wrt. temporal criteria. Finally, through an evaluation based on TREC} collections, we demonstrate how retrieval performance of queries consisting of named entities can be improved using our approach. ""
Kaptein, Rianne & Kamps, Jaap Finding entities in wikipedia using links and categories 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [124] In this paper we describe our participation in the INEX} Entity Ranking track. We explored the relations between Wikipedia pages, categories and links. Our approach is to exploit both category and link information. Category information is used by calculating distances between document categories and target categories. Link information is used for relevance propagation and in the form of a document link prior. Both sources of information have value, but using category information leads to the biggest improvements. 2009 Springer Berlin Heidelberg.
Kaptein, Rianne & Kamps, Jaap Using links to classify wikipedia pages 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [125] This paper contains a description of experiments for the 2008 INEX} XML-mining} track. Our goal for the XML-mining} track is to explore whether we can use link information to improve classification accuracy. Our approach is to propagate category probabilities over linked pages. We find that using link information leads to marginal improvements over a baseline that uses a Naive Bayes model. For the initially misclassified pages, link information is either not available or contains too much noise. 2009 Springer Berlin Heidelberg.
Kawaba, Mariko; Nakasaki, Hiroyuki; Yokomoto, Daisuke; Utsuro, Takehito & Fukuhara, Tomohiro Linking Wikipedia entries to blog feeds by machine learning 3rd International Universal Communication Symposium, IUCS 2009, December 3, 2009 - December 4, 2009 Tokyo, Japan 2009 [126] This paper studies the issue of conceptually indexing the blogosphere through the whole hierarchy of Wikipedia entries. This paper proposes how to link Wikipedia entries to blog feeds in the Japanese blogosphere by machine learning, where about 300,000 Wikipedia entries are used for representing a hierarchy of topics. In our experimental evaluation, we achieved over 80\% precision in the task. ""
Kc, Milly; Chau, Rowena; Hagenbuchner, Markus; Tsoi, Ah Chung & Lee, Vincent A machine learning approach to link prediction for interlinked documents 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [127] This paper provides an explanation to how a recently developed machine learning approach, namely the Probability Measure Graph Self-Organizing} Map (PM-GraphSOM) can be used for the generation of links between referenced or otherwise interlinked documents. This new generation of SOM} models are capable of projecting generic graph structured data onto a fixed sized display space. Such a mechanism is normally used for dimension reduction, visualization, or clustering purposes. This paper shows that the PM-GraphSOM} training algorithm inadvertently" encodes relations that exist between the atomic elements in a graph. If the nodes in the graph represent documents and the links in the graph represent the reference (or hyperlink) structure of the documents then it is possible to obtain a set of links for a test document whose link structure is unknown. A significant finding of this paper is that the described approach is scalable in that links can be extracted in linear time. It will also be shown that the proposed approach is capable of predicting the pages which would be linked to a new document and is capable of predicting the links to other documents from a given test document. The approach is applied to web pages from Wikipedia a relatively large XML} text database consisting of many referenced documents. 2010 Springer-Verlag} Berlin Heidelberg."
Kimelfeid, Benny; Kovacs, Eitan; Sagiv, Yehoshua & Yahav, Dan Using language models and the HITS algorithm for XML retrieval 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany 2007 Our submission to the INEX} 2006 Ad-hoc retrieval track is described. We study how to utilize the Wikipedia structure (XML} documents with hyperlinks) by combining XML} and Web retrieval. In particular, we experiment with different combinations of language models and the {HITS} algorithm. An important feature of our techniques is a filtering phase that identifies the relevant part of the corpus, prior to the processing of the actual XML} elements. We analyze the effect of the above techniques based on the results of our runs in INEX} 2006. Springer-Verlag} Berlin Heidelberg 2007.
Kiritani, Yusuke; Ma, Qiang & Yoshikawa, Masatoshi Classifying web pages by using knowledge bases for entity retrieval 20th International Conference on Database and Expert Systems Applications, DEXA 2009, August 31, 2009 - September 4, 2009 Linz, Austria 2009 [128] In this paper, we propose a novel method to classify Web pages by using knowledge bases for entity search, which is a kind of typical Web search for information related to a person, location or organization. First, we map a Web page to entities according to the similarities between the page and the entities. Various methods for computing such similarity are applied. For example, we can compute the similarity between a given page and a Wikipedia article describing a certain entity. The frequency of an entity appearing in the page is another factor used in computing the similarity. Second, we construct a directed acyclic graph, named PEC} graph, based on the relations among Web pages, entities, and categories, by referring to YAGO, a knowledge base built on Wikipedia and WordNet.} Finally, by analyzing the PEC} graph, we classify Web pages into categories. The results of some preliminary experiments validate the methods proposed in this paper. 2009 Springer Berlin Heidelberg.
Kirtsis, Nikos; Stamou, Sofia; Tzekou, Paraskevi & Zotos, Nikos Information uniqueness in Wikipedia articles 6th International Conference on Web Information Systems and Technologies, WEBIST 2010, April 7, 2010 - April 10, 2010 Valencia, Spain 2010 Wikipedia is one of the most successful worldwide collaborative efforts to put together user generated content in a meaningfully organized and intuitive manner. Currently, Wikipedia hosts millions of articles on a variety of topics, supplied by thousands of contributors. A critical factor in Wikipedia's success is its open nature, which enables everyone edit, revise and /or question (via talk pages) the article contents. Considering the phenomenal growth of Wikipedia and the lack of a peer review process for its contents, it becomes evident that both editors and administrators have difficulty in validating its quality on a systematic and coordinated basis. This difficulty has motivated several research works on how to assess the quality of Wikipedia articles. In this paper, we propose the exploitation of a novel indicator for the Wikipedia articles' quality, namely information uniqueness. In this respect, we describe a method that captures the information duplication across the article contents in an attempt to infer the amount of distinct information every article communicates. Our approach relies on the intuition that an article offering unique information about its subject is of better quality compared to an article that discusses issues already addressed in several other Wikipedia articles.
Kisilevich, Slava; Mansmann, Florian; Bak, Peter; Keim, Daniel & Tchaikin, Alexander Where would you go on your next vacation? A framework for visual exploration of attractive places 2nd International Conference on Advanced Geographic Information Systems, Applications, and Services, GEOProcessing 2010, February 10, 2010 - February 16, 2010 St. Maarten, Netherlands 2010 [129] Tourists face a great challenge when they gather information about places they want to visit. Geographically tagged information in the form of Wikipedia pages, local tourist information pages, dedicated web sites and the massive amount of information provided by Google Earth is publicly available and commonly used. But the processing of this information involves a time consuming activity. Our goal is to make search for attractive places simpler for the common user and provide researchers with methods for exploration and analysis of attractive areas. We assume that an attractive place is characterized by large amounts of photos taken by many people. This paper presents a framework in which we demonstrate a systematic approach for visualization and exploration of attractive places as a zoomable information layer. The presented technique utilizes density-based clustering of image coordinates and smart color scaling to produce an interactive visualizations using Google Earth Mashup1. We show that our approach can be used as a basis for detailed analysis of attractive areas. In order to demonstrate our method, we use real-world geo-tagged photo data obtained from Flickr2 and Panoramio3 to construct interactive visualizations of virtually every region of interest in the world. ""
Kittur, Aniket; Suh, Bongwon; Pendleton, Bryan A. & Chi, Ed H. He says, she says: Conflict and coordination in Wikipedia 25th SIGCHI Conference on Human Factors in Computing Systems 2007, CHI 2007, April 28, 2007 - May 3, 2007 San Jose, CA, United states 2007 [130] Wikipedia, a wiki-based encyclopedia, has become one of the most successful experiments in collaborative knowledge building on the Internet. As Wikipedia continues to grow, the potential for conflict and the need for coordination increase as well. This article examines the growth of such non-direct work and describes the development of tools to characterize conflict and coordination costs in Wikipedia. The results may inform the design of new collaborative knowledge systems. ""
Kiyota, Yoji; Nakagawa, Hiroshi; Sakai, Satoshi; Mori, Tatsuya & Masuda, Hidetaka Exploitation of the Wikipedia category system for enhancing the value of LCSH 2009 ACM/IEEE Joint Conference on Digital Libraries, JCDL'09, June 15, 2009 - June 19, 2009 Austin, TX, United states 2009 [131] This paper addresses an approach that integrates two different types of information resources: the Web and libraries. Our method begins from any keywords in Wikipedia, and induces related subject headings of LCSH} through the Wikipedia category system.
Koolen, Marijn & Kamps, Jaap What's in a link? from document importance to topical relevance 2nd International Conference on the Theory of Information Retrieval, ICTIR 2009, September 10, 2009 - September 12, 2009 Cambridge, United kingdom 2009 [132] Web information retrieval is best known for its use of the Web's link structure as a source of evidence. Global link evidence is by nature query-independent, and is therefore no direct indicator of the topical relevance of a document for a given search request. As a result, link information is usually considered to be useful to identify the 'importance' of documents. Local link evidence, in contrast, is query-dependent and could in principle be related to the topical relevance. We analyse the link evidence in Wikipedia using a large set of ad hoc retrieval topics and relevance judgements to investigate the relation between link evidence and topical relevance. 2009 Springer Berlin Heidelberg.
Koolen, Marijn; Kaptein, Rianne & Kamps, Jaap Focused search in books and wikipedia: Categories, links and relevance feedback 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [133] In this paper we describe our participation in INEX} 2009 in the Ad Hoc Track, the Book Track, and the Entity Ranking Track. In the Ad Hoc track we investigate focused link evidence, using only links from retrieved sections. The new collection is not only annotated with Wikipedia categories, but also with YAGO/WordNet} categories. We explore how we can use both types of category information, in the Ad Hoc Track as well as in the Entity Ranking Track. Results in the Ad Hoc Track show Wikipedia categories are more effective than WordNet} categories, and Wikipedia categories in combination with relevance feedback lead to the best results. Preliminary results of the Book Track show full-text retrieval is effective for high early precision. Relevance feedback further increases early precision. Our findings for the Entity Ranking Track are in direct opposition of our Ad Hoc findings, namely, that the WordNet} categories are more effective than the Wikipedia categories. This marks an interesting difference between ad hoc search and entity ranking. 2010 Springer-Verlag} Berlin Heidelberg.
Kriplean, Travis; Beschastnikh, Ivan; McDonald, David W. & Golder, Scott A. Community, consensus, coercion, control: CS*W or how policy mediates mass participation 2007 International ACM Conference on Supporting Group Work, GROUP'07, November 4, 2007 - November 7, 2007 Sanibel Island, FL, United states 2007 [134] When large groups cooperate, issues of conflict and control surface because of differences in perspective. Managing such diverse views is a persistent problem in cooperative group work. The Wikipedian community has responded with an evolving body of policies that provide shared principles, processes, and strategies for collaboration. We employ a grounded approach to study a sample of active talk pages and examine how policies are employed as contributors work towards consensus. Although policies help build a stronger community, we find that ambiguities in policies give rise to power plays. This lens demonstrates that support for mass collaboration must take into account policy and power. ""
Kuribara, Shusuke; Abbas, Safia & Sawamura, Hajime Applying the logic of multiple-valued argumentation to social web: SNS and wikipedia 11th Pacific Rim International Conference on Multi-Agents, PRIMA 2008, December 15, 2008 - December 16, 2008 Hanoi, Viet nam 2008 [135] The Logic of Multiple-Valued} Argumentation (LMA) is an argumentation framework that allows for argument-based reasoning about uncertain issues under uncertain knowledge. In this paper, we describe its applications to Social Web: SNS} and Wikipedia. They are said to be the most influential social Web applications to the present and future information society. For SNS, we present an agent that judges the registration approval for Mymixi in mixi in terms of LMA.} For Wikipedia, we focus on the deletion problem of Wikipedia and present agents that argue about the issue on whether contributed articles should be deleted or not, analyzing arguments proposed for deletion in terms of LMA.} These attempts reveal that LMA} can deal with not only potential applications but also practical ones such as extensive and contemporary applications. 2008 Springer Berlin Heidelberg.
Kusrsten, Jens; Richter, Daniel & Eibl, Maximilian VideoCLEF 2008: ASR classification with wikipedia categories 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [136] This article describes our participation at the VideoCLEF} track. We designed and implemented a prototype for the classification of the Video ASR} data. Our approach was to regard the task as text classification problem. We used terms from Wikipedia categories as training data for our text classifiers. For the text classification the Naive-Bayes} and KNN} classifier from the WEKA} toolkit were used. We submitted experiments for classification task 1 and 2. For the translation of the feeds to English (translation task) Google's AJAX} language API} was used. Although our experiments achieved only low precision of 10 to 15 percent, we assume those results will be useful in a combined setting with the retrieval approach that was widely used. Interestingly, we could not improve the quality of the classification by using the provided metadata. 2009 Springer Berlin Heidelberg.
Kutty, Sangeetha; Tran, Tien; Nayak, Richi & Li, Yuefeng Clustering XML documents using closed frequent subtrees: A structural similarity approach 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [137] This paper presents the experimental study conducted over the INEX} 2007 Document Mining Challenge corpus employing a frequent subtree-based incremental clustering approach. Using the structural information of the XML} documents, the closed frequent subtrees are generated. A matrix is then developed representing the closed frequent subtree distribution in documents. This matrix is used to progressively cluster the XML} documents. In spite of the large number of documents in INEX} 2007 Wikipedia dataset, the proposed frequent subtree-based incremental clustering approach was successful in clustering the documents. 2008 Springer-Verlag} Berlin Heidelberg.
Lahti, Lauri Personalized learning paths based on wikipedia article statistics 2nd International Conference on Computer Supported Education, CSEDU 2010, April 7, 2010 - April 10, 2010 Valencia, Spain 2010 We propose a new semi-automated method for generating personalized learning paths from the Wikipedia online encyclopedia by following inter-article hyperlink chains based on various rankings that are retrieved from the statistics of the articles. Alternative perspectives for learning topics are achieved when the next hyperlink to access is selected based on hierarchy of hyperlinks, repetition of hyperlink terms, article size, viewing rate, editing rate, or user-defined weighted mixture of them all. We have implemented the method in a prototype enabling the learner to build independently concept maps following her needs and consideration. A list of related concepts is shown in a desired type of ranking to label new nodes (titles of target articles for current hyperlinks) accompanied with parsed explanation phrases from the sentences surrounding each hyperlink to label directed arcs connecting nodes. In experiments the alternative ranking schemes well supported various learning needs suggesting new pedagogical networking practices.
Lahti, Lauri & Tarhio, Jorma Semi-automated map generation for concept gaming Computer Graphics and Visualization 2008 and Gaming 2008: Design for Engaging Experience and Social Interaction 2008, MCCSIS'08 - IADIS Multi Conference on Computer Science and Information Systems, July 22, 2008 - July 27, 2008 Amsterdam, Netherlands 2008 Conventional learning games have often limited flexibility to address individual needs of a learner. The concept gaming approach provides a frame for handling conceptual structures that are defined by a concept map. A single concept map can be used to create many alternative games and these can be chosen so that personal learning goals can be taken well into account. However, the workload of creating new concept maps and sharing them effectively seems to easily hinder adoption of concept gaming. We now propose a new semi-automated map generation method for concept gaming. Due to fast increase in the open access knowledge available in the Web, the articles of the Wikipedia encyclopedia were chosen to serve as a source for concept map generation. Based on a given entry name the proposed method produces hierarchical concept maps that can be freely explored and modified. Variants of this approach could be successfully implemented in the wide range of educational tasks. In addition, ideas for further development of concept gaming are proposed.
Lam, Shyong K. & Riedl, John Is Wikipedia growing a longer tail? 2009 ACM SIGCHI International Conference on Supporting Group Work, GROUP'09, May 10, 2009 - May 13, 2009 Sanibel Island, FL, United states 2009 [138] Wikipedia has millions of articles, many of which receive little attention. One group of Wikipedians believes these obscure entries should be removed because they are uninteresting and neglected; these are the deletionists. Other Wikipedians disagree, arguing that this long tail of articles is precisely Wikipedia's advantage over other encyclopedias; these are the inclusionists. This paper looks at two overarching questions on the debate between deletionists and inclusionists: (1) What are the implications to the long tail of the evolving standards for article birth and death? (2) How is viewership affected by the decreasing notability of articles in the long tail? The answers to five detailed research questions that are inspired by these overarching questions should help better frame this debate and provide insight into how Wikipedia is evolving. ""
Lanamaki, Arto & Paivarinta, Tero Metacommunication patterns in online communities 3rd International Conference on Online Communities and Social Computing, OCSC 2009. Held as Part of HCI International 2009, July 19, 2009 - July 24, 2009 San Diego, CA, United states 2009 [139] This paper discusses about contemporary literature on computer-mediated metacommunication and observes the phenomenon in two online communities. The results contribute by identifying six general-level patterns of how metacommunication refers to primary communication in online communities. A task-oriented, user-administrated, community (Wikipedia} in Finnish) involved a remarkable number of specialized metacommunication genres. In a centrally moderated discussion-oriented community (Patientslikeme), metacommunication was intertwined more with primary ad hoc communication. We suggest that a focus on specialized metacommunication genres may appear useful in online communities. However, room for ad hoc (meta)communication is needed as well, as it provides a basis for user-initiated community development. 2009 Springer Berlin Heidelberg.
Laroslaw, Kuchta Passing from requirements specification to class model using application domain ontology 2010 2nd International Conference on Information Technology, ICIT 2010, June 28, 2010 - June 30, 2010 Gdansk, Poland 2010 The quality of a classic software engineering process depends on the completeness of project documents and on the inter-phase consistency. In this paper, a method for passing from the requirement specification to the class model is proposed. First, a developer browses the text of the requirements, extracts the word sequences, and places them as terms into the glossary. Next, the internal ontology logic for the glossary needs to be elaborated. External ontology sources, as Wikipedia or domain ontology services, may be used to support this stage. At the end, the newly built ontology is transformed to the class model. The whole process may be supported with semi-automated, interactive tools. The result should be the class model with better completeness and consistency than using traditional methods.
Larsen, Jakob Eg; Halling, Sren; Sigurosson, Magnus & Hansen, Lars Kai MuZeeker: Adapting a music search engine for mobile phones Mobile Multimedia Processing - Fundamentals, Methods, and Applications Tiergartenstrasse 17, Heidelberg, D-69121, Germany 2010 [140] We describe MuZeeker, a search engine with domain knowledge based on Wikipedia. MuZeeker} enables the user to refine a search in multiple steps by means of category selection. In the present version we focus on multimedia search related to music and we present two prototype search applications (web-based and mobile) and discuss the issues involved in adapting the search engine for mobile phones. A category based filtering approach enables the user to refine a search through relevance feedback by category selection instead of typing additional text, which is hypothesized to be an advantage in the mobile MuZeeker} application. We report from two usability experiments using the think aloud protocol, in which N=20 participants performed tasks using MuZeeker} and a customized Google search engine. In both experiments web-based and mobile user interfaces were used. The experiment shows that participants are capable of solving tasks slightly better using MuZeeker, while the inexperienced" MuZeeker} users perform slightly slower than experienced Google users. This was found in both the web-based and the mobile applications. It was found that task performance in the mobile search applications (MuZeeker} and Google) was 2-2.5 times lower than the corresponding web-based search applications (MuZeeker} and Google). """
Larson, Martha; Newman, Eamonn & Jones, Gareth J. F. Overview of videoCLEF 2008: Automatic generation of topic-based feeds for dual language audio-visual content 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [141] The VideoCLEF} track, introduced in 2008, aims to develop and evaluate tasks related to analysis of and access to multilingual multimedia content. In its first year, VideoCLEF} piloted the Vid2RSS} task, whose main subtask was the classification of dual language video (Dutch-language} television content featuring English-speaking experts and studio guests). The task offered two additional discretionary subtasks: feed translation and automatic keyframe extraction. Task participants were supplied with Dutch archival metadata, Dutch speech transcripts, English speech transcripts and ten thematic category labels, which they were required to assign to the test set videos. The videos were grouped by class label into topic-based RSS-feeds, displaying title, description and keyframe for each video. Five groups participated in the 2008 VideoCLEF} track. Participants were required to collect their own training data; both Wikipedia and general web content were used. Groups deployed various classifiers (SVM, Naive Bayes and K-NN) or treated the problem as an information retrieval task. Both the Dutch speech transcripts and the archival metadata performed well as sources of indexing features, but no group succeeded in exploiting combinations of feature sources to significantly enhance performance. A small scale fluency/adequacy evaluation of the translation task output revealed the translation to be of sufficient quality to make it valuable to a Non-Dutch} speaking English speaker. For keyframe extraction, the strategy chosen was to select the keyframe from the shot with the most representative speech transcript content. The automatically selected shots were shown, with a small user study, to be competitive with manually selected shots. Future years of VideoCLEF} will aim to expand the corpus and the class label list, as well as to extend the track to additional tasks. 2009 Springer Berlin Heidelberg.
Le, Qize & Panchal, Jitesh H. Modeling the effect of product architecture on mass collaborative processes - An agent-based approach 2009 ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, DETC2009, August 30, 2009 - September 2, 2009 San Diego, CA, United states 2010 Traditional product development efforts are based on well-structured and hierarchical product development teams. The products are systematically decomposed into subsystems that are designed by dedicated teams with well-defined information flows. Recently, a new product development approach called Mass Collaborative Product Development (MCPD) has emerged. The fundamental difference between a traditional product development process and a MCPD} process is that the former is based on top-down decomposition while the latter is based on evolution and self-organization. The paradigm of MCPD} has resulted in highly successful products such as Wikipedia, Linux and Apache. Despite the success of various projects using MCPD, it is not well understood how the product architecture affects the evolution of products developed using such processes. To address this gap, an agent-based model to study MCPD} processes is presented in this paper. Through this model, the effect of product architectures on the product evolution is studied. The model is executed for different architectures ranging from slot architecture to bus architecture and the rates of product evolution are determined. The simulation-based approach allows us to study how the degree of modularity of products affects the evolution time of products and different modules in the MCPD} processes. The methodology is demonstrated using an illustrative example of mobile phones. This approach provides a simple and intuitive way to study the effects of product architecture on the MCPD} processes. It is helpful in determining the best strategies for product decomposition and identifying the product architectures that are suitable for the MCPD processes.
Le, Minh-Tam; Dang, Hoang-Vu; Lim, Ee-Peng & Datta, Anwitaman WikiNetViz: Visualizing friends and adversaries in implicit social networks IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008, June 17, 2008 - June 20, 2008 Taipei, Taiwan 2008 [142] When multiple users with diverse backgrounds and beliefs edit Wikipedia together, disputes often arise due to disagreements among the users. In this paper, we introduce a novel visualization tool known as WikiNetViz} to visualize and analyze disputes among users in a dispute-induced social network. WikiNetViz} is designed to quantify the degree of dispute between a pair of users using the article history. Each user (and article) is also assigned a controversy score by our proposed ControversyRank} model so as to measure the degree of controversy of a user (and an article) by the amount of disputes between the user (article) and other users in articles of varying degrees of controversy. On the constructed social network, WikiNetViz} can perform clustering so as to visualize the dynamics of disputes at the user group level. It also provides an article viewer for examining an article revision so as to determine the article content modified by different users. ""
Lee, Kangpyo; Kim, Hyunwoo; Shin, Hyopil & Kim, Hyoung-Joo FolksoViz: A semantic relation-based folksonomy visualization using the Wikipedia corpus 10th ACIS Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2009, In conjunction with IWEA 2009 and WEACR 2009, May 27, 2009 - May 29, 2009 Daegu, Korea, Republic of 2009 [143] Tagging is one of the most popular services in Web 2.0 and folksonomy is a representation of collaborative tagging. Tag cloud has been the one and only visualization of the folksonomy. The tag cloud, however, provides no information about the relations between tags. In this paper, targeting del.icio.us tag data, we propose a technique, FolksoViz, for automatically deriving semantic relations between tags and for visualizing the tags and their relations. In order to find the equivalence, subsumption, and similarity relations, we apply various rules and models based on the Wikipedia corpus. The derived relations are visualized effectively. The experiment shows that the FolksoViz} manages to find the correct semantic relations with high accuracy. ""
Lee, Kangpyo; Kim, Hyunwoo; Shin, Hyopil & Kim, Hyoung-Joo Tag sense disambiguation for clarifying the vocabulary of social tags 2009 IEEE International Conference on Social Computing, SocialCom 2009, August 29, 2009 - August 31, 2009 Vancouver, BC, Canada 2009 [144] Tagging is one of the most popular services in Web 2.0. As a special form of tagging, social tagging is done collaboratively by many users, which forms a so-called folksonomy. As tagging has become widespread on the Web, the tag vocabulary is now very informal, uncontrolled, and personalized. For this reason, many tags are unfamiliar and ambiguous to users so that they fail to understand the meaning of each tag. In this paper, we propose a tag sense disambiguating method, called Tag Sense Disambiguation (TSD), which works in the social tagging environment. TSD} can be applied to the vocabulary of social tags, thereby enabling users to understand the meaning of each tag through Wikipedia. To find the correct mappings from del.icio.us tags to Wikipedia articles, we define the Local )eighbor tags, the Global )eighbor tags, and finally the )eighbor tags that would be the useful keywords for disambiguating the sense of each tag based on the tag co-occurrences. The automatically built mappings are reasonable in most cases. The experiment shows that TSD} can find the correct mappings with high accuracy. ""
Lees-Miller, John; Anderson, Fraser; Hoehn, Bret & Greiner, Russell Does Wikipedia information help Netflix predictions? 7th International Conference on Machine Learning and Applications, ICMLA 2008, December 11, 2008 - December 13, 2008 San Diego, CA, United states 2008 [145] We explore several ways to estimate movie similarity from the free encyclopedia Wikipedia with the goal of improving our predictions for the Netflix Prize. Our system first uses the content and hyperlink structure of Wikipedia articles to identify similarities between movies. We then predict a user's unknown ratings by using these similarities in conjunction with the user's known ratings to initialize matrix factorization and K-Nearest} Neighbours algorithms. We blend these results with existing ratings-based predictors. Finally, we discuss our empirical results, which suggest that external Wikipedia data does not significantly improve the overall prediction accuracy. ""
Lehtonen, Miro & Doucet, Antoine Phrase detection in the Wikipedia 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [146] The Wikipedia XML} collection turned out to be rich of marked-up phrases as we carried out our INEX} 2007 experiments. Assuming that a phrase occurs at the inline level of the markup, we were able to identify over 18 million phrase occurrences, most of which were either the anchor text of a hyperlink or a passage of text with added emphasis. As our IR} system - EXTIRP} - indexed the documents, the detected inline-level elements were duplicated in the markup with two direct consequences: 1) The frequency of the phrase terms increased, and 2) the word sequences changed. Because the markup was manipulated before computing word sequences for a phrase index, the actual multi-word phrases became easier to detect. The effect of duplicating the inline-level elements was tested by producing two run submissions in ways that were similar except for the duplication. According to the official INEX} 2007 metric, the positive effect of duplicated phrases was clear. 2008 Springer-Verlag} Berlin Heidelberg.
Lehtonen, Miro & Doucet, Antoine EXTIRP: Baseline retrieval from Wikipedia 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany 2007 The Wikipedia XML} documents are considered an interesting challenge to any XML} retrieval system that is capable of indexing and retrieving XML} without prior knowledge of the structure. Although the structure of the Wikipedia XML} documents is highly irregular and thus unpredictable, EXTIRP} manages to handle all the well-formed XML} documents without problems. Whether the high flexibility of EXTIRP} also implies high performance concerning the quality of IR} has so far been a question without definite answers. The initial results do not confirm any positive answers, but instead, they tempt us to define some requirements for the XML} documents that EXTIRP} is expected to index. The most interesting question stemming from our results is about the line between high-quality XML} markup which aids accurate IR} and noisy XML} spam" that misleads flexible XML} search engines. Springer-Verlag} Berlin Heidelberg 2007."
Leong, Peter; Siak, Chia Bin & Miao, Chunyan Cyber engineering co-intelligence digital ecosystem: The GOFASS methodology 2009 3rd IEEE International Conference on Digital Ecosystems and Technologies, DEST '09, June 1, 2009 - June 3, 2009 Istanbul, Turkey 2009 [147] Co-intelligence, also known as collective or collaborative intelligence, is the harnessing of human knowledge and intelligence that allows groups of people to act together in ways that seem to be intelligent. Co-intelligence Internet applications such as Wikipedia are the first steps toward developing digital ecosystems that support collective intelligence. Peer-to-peer (P2P) systems are well fitted to Co-Intelligence} digital ecosystems because they allow each service client machine to act also as a service provider without any central hub in the network of cooperative relationships. However, dealing with server farms, clusters and meshes of wireless edge devices will be the norm in the next generation of computing; but most present P2P} system had been designed with a fixed, wired infrastructure in mind. This paper proposes a methodology for cyber engineering an intelligent agent mediated co-intelligence digital ecosystems. Our methodology caters for co-intelligence digital ecosystems with wireless edge devices working with service-oriented information servers. ""
Li, Bing; Chen, Qing-Cai; Yeung, Daniel S.; Ng, Wing W.Y. & Wang, Xiao-Long Exploring wikipedia and query log's ability for text feature representation 6th International Conference on Machine Learning and Cybernetics, ICMLC 2007, August 19, 2007 - August 22, 2007 Hong Kong, China 2007 [148] The rapid increase of internet technology requires a better management of web page contents. Many text mining researches has been conducted, like text categorization, information retrieval, text clustering. When machine learning methods or statistical models are applied to such a large scale of data, the first step we have to solve is to represent a text document into the way that computers could handle. Traditionally, single words are always employed as features in Vector Space Model, which make up the feature space for all text documents. The single-word based representation is based on the word independence and doesn't consider their relations, which may cause information missing. This paper proposes Wiki-Query} segmented features to text classification, in hopes of better using the text information. The experiment results show that a much better F1 value has been achieved than that of classical single-word based text representation. This means that Wikipedia and query segmented feature could better represent a text document. ""
Li, Yun; Huang, Kaiyan; Ren, Fuji & Zhong, Yixin Searching and computing for vocabularies with semantic correlations from Chinese Wikipedia China-Ireland International Conference on Information and Communications Technologies, CIICT 2008, September 26, 2008 - September 28, 2008 Beijing, China 2008 [149] This paper introduces experiment on searching for semantically correlated vocabularies in Chinese Wikipedia pages and computing semantic correlations. Based on the 54,745 structured documents generated from Wikipedia pages, we explore about 400,000 pairs of Wikipedia vocabularies considering of hyperlinks, overlapped text and document positions. Semantic relatedness is calculated based on the relatedness of Wikipedia documents. From comparing experiment we analyze the reliability of our measures and some other properties.
Lian, Li; Ma, Jun; Lei, JingSheng; Song, Ling & Liu, LeBo Automated construction Chinese domain ontology from Wikipedia 4th International Conference on Natural Computation, ICNC 2008, October 18, 2008 - October 20, 2008 Jinan, China 2008 [150] Wikipedia (Wiki) is a collaborative on-line encyclopedia, where web users are able to share their knowledge about a certain topic. How to make use of the rich knowledge in the Wiki is a big challenge. In this paper we propose a method to construct domain ontology from the Chinese Wiki automatically. The main Idea in this paper is based on the entry segmenting and Feature Text (FT) extracting, where we segment the name of entries and establish the concept hierarchy firstly. Secondly, we extract the FTs} from the descriptions of entries to eliminate the redundant information. Finally we calculate the similarity between pairs of FTs} to revise the concept hierarchy and gain non-taxonomy relations between concepts. The primary experiment indicates that our method is useful for Chinese domain ontology construction. ""
Liang, Chia-Kai; Hsieh, Yu-Ting; Chuang, Tien-Jung; Wang, Yin; Weng, Ming-Fang & Chuang, Yung-Yu Learning landmarks by exploiting social media 16th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM 2010, October 6, 2010 - October 8, 2010 Chongqing, China 2009 [151] This paper introduces methods for automatic annotation of landmark photographs via learning textual tags and visual features of landmarks from landmark photographs that are appropriately location-tagged from social media. By analyzing spatial distributions of text tags from Flickr's geotagged photos, we identify thousands of tags that likely refer to landmarks. Further verification by utilizing Wikipedia articles filters out non-landmark tags. Association analysis is used to find the containment relationship between landmark tags and other geographic names, thus forming a geographic hierarchy. Photographs relevant to each landmark tag were retrieved from Flickr and distinctive visual features were extracted from them. The results form ontology for landmarks, including their names, equivalent names, geographic hierarchy, and visual features. We also propose an efficient indexing method for content-based landmark search. The resultant ontology could be used in tag suggestion and content-relevant re-ranking. 2010 Springer-Verlag} Berlin Heidelberg.
Lim, Ee-Peng; Kwee, Agus Trisnajaya; Ibrahim, Nelman Lubis; Sun, Aixin; Datta, Anwitaman; Chang, Kuiyu & Maureen Visualizing and exploring evolving information networks in Wikipedia 12th International Conference on Asia-Pacific Digital Libraries, ICADL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia 2010 [152] Information networks in Wikipedia evolve as users collaboratively edit articles that embed the networks. These information networks represent both the structure and content of community's knowledge and the networks evolve as the knowledge gets updated. By observing the networks evolve and finding their evolving patterns, one can gain higher order knowledge about the networks and conduct longitudinal network analysis to detect events and summarize trends. In this paper, we present SSNetViz+, a visual analytic tool to support visualization and exploration of Wikipedia's information networks. SSNetViz+} supports time-based network browsing, content browsing and search. Using a terrorism information network as an example, we show that different timestamped versions of the network can be interactively explored. As information networks in Wikipedia are created and maintained by collaborative editing efforts, the edit activity data are also shown to help detecting interesting events that may have happened to the network. SSNetViz+} also supports temporal queries that allow other relevant nodes to be added so as to expand the network being analyzed. ""
Lim, Ee-Peng; Wang, Z.; Sadeli, D.; Li, Y.; Chang, Chew-Hung; Chatterjea, Kalyani; Goh, Dion Hoe-Lian; Theng, Yin-Leng; Zhang, Jun & Sun, Aixin Integration of Wikipedia and a geography digital library 9th International Conference on Asian Digital Libraries, ICADL 2006, November 27, 2006 - November 30, 2006 Kyoto, Japan 2006 In this paper, we address the problem of integrating Wikipedia, an online encyclopedia, and G-Portal, a web-based digital library, in the geography domain. The integration facilitates the sharing of data and services between the two web applications that are of great value in learning. We first present an overall system architecture for supporting such an integration and address the metadata extraction problem associated with it. In metadata extraction, we focus on extracting and constructing metadata for geo-political regions namely cities and countries. Some empirical performance results will be presented. The paper will also describe the adaptations of G-Portal} and Wikipedia to meet the integration requirements. Springer-Verlag} Berlin Heidelberg 2006.
Linna, Li The design of semantic web services discovery model based on multi proxy 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2009, November 20, 2009 - November 22, 2009 Shanghai, China 2009 [153] Web services have changed the Web from a database of static documents to a service provider. To improve the automation of Web services interoperation, a lot of technologies are recommended, such as semantic Web services and proxys. In this paper we propose a model for semantic Web service discovery based on semantic Web services and FIPA} multi proxys. This paper provides a broker which provides semantic interoperability between semantic Web service provider and proxys by translating WSDL} to DF} description for semantic Web services and DF} description to WSDL} for FIPA} multi proxys. We describe how the proposed architecture analyzes the request and match search query. The ontology management in the broker creates the user ontology and merges it with general ontology (i.e. WordNet, Yago, Wikipedia ...). We also describe the recommendation component that recommends the WSDL} to Web service provider to increase their retrieval probability in the related queries. ""
Lintean, Mihai; Moldovan, Cristian; Rus, Vasile & McNamara, Danielle The role of local and global weighting in assessing the semantic similarity of texts using latent semantic analysis 23rd International Florida Artificial Intelligence Research Society Conference, FLAIRS-23, May 19, 2010 - May 21, 2010 Daytona Beach, FL, United states 2010 In this paper, we investigate the impact of several local and global weighting schemes on Latent Semantic Analysis' (LSA) ability to capture semantic similarity between two texts. We worked with texts varying in size from sentences to paragraphs. We present a comparison of 3 local and 3 global weighting schemes across 3 different standardized data sets related to semantic similarity tasks. For local weighting, we used binary weighting, term-frequency, and log-type. For global weighting, we relied on binary, inverted document frequencies (IDF) collected from the English Wikipedia, and entropy, which is the standard weighting scheme used by most LSA-based} applications. We studied all possible combinations of these weighting schemes on the following three tasks and corresponding data sets: paraphrase identification at sentence level using the Microsoft Research Paraphrase Corpus, paraphrase identification at sentence level using data from the intelligent tutoring system ISTART, and mental model detection based on student-articulated paragraphs in MetaTutor, another intelligent tutoring system. Our experiments revealed that for sentence-level texts a combination of type frequency local weighting in combination with either IDF} or binary global weighting works best. For paragraph-level texts, a log-type local weighting in combination with binary global weighting works best. We also found that global weights have a greater impact for sententence-level similarity as the local weight is undermined by the small size of such texts. Copyright 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Liu, Changxin; Chen, Huijuan; Tan, Yunlan & Wu, Lanying The design of e-Learning system based on semantic wiki and multi-agent 2nd International Workshop on Education Technology and Computer Science, ETCS 2010, March 6, 2010 - March 7, 2010 Wuhan, Hubei, China 2010 [154] User interactions and social networks are based on web2.0, the well-known application are blogs, Wikis, and image/video sharing sites. They have dramatically increased sharing and participation among web users. Knowledge was collected and information was shared using social software. Wikipedia is a successful example of web technology. It has helped knowledge-sharing between people. User can freely create and modify its content, but Wikipedia cannot understand its content. This problem has solved by semantic Wiki. The E-Learning system has been designed based on semantic Wiki and multi-agent. It can help us to implement a distributed learning resource discovery and individualized service. The prototype is of efficient navigation and search. ""
Liu, Qiaoling; Xu, Kaifeng; Zhang, Lei; Wang, Haofen; Yu, Yong & Pan, Yue Catriple: Extracting triples from wikipedia categories 3rd Asian Semantic Web Conference, ASWC 2008, December 8, 2008 - December 11, 2008 Bangkok, Thailand 2008 [155] As an important step towards bootstrapping the Semantic Web, many efforts have been made to extract triples from Wikipedia because of its wide coverage, good organization and rich knowledge. One kind of important triples is about Wikipedia articles and their non-isa properties, e.g. (Beijing, country, China). Previous work has tried to extract such triples from Wikipedia infoboxes, article text and categories. The infobox-based and text-based extraction methods depend on the infoboxes and suffer from a low article coverage. In contrast, the category-based extraction methods exploit the widespread categories. However, they rely on predefined properties, which is too effort-consuming and explores only very limited knowledge in the categories. This paper automatically extracts properties and triples from the less explored Wikipedia categories so as to achieve a wider article coverage with less manual effort. We manage to realize this goal by utilizing the syntax and semantics brought by super-sub category pairs in Wikipedia. Our prototype implementation outputs about {10M} triples with a 12-level confidence ranging from 47.0\% to 96.4\%, which cover 78.2\% of Wikipedia articles. Among them, {1.27M} triples have confidence of 96.4\%. Applications can on demand use the triples with suitable confidence. 2008 Springer Berlin Heidelberg.
Lu, Zhiqiang; Shao, Werimin & Yu, Zhenhua Measuring semantic similarity between words using wikipedia 2009 International Conference on Web Information Systems and Mining, WISM 2009, November 7, 2009 - November 8, 2009 Shanghai, China 2009 [156] Semantic similarity measures play an important role in the extraction of semantic relations. Semantic similarity measures are widely used in Natural Language Processing (NLP) and information Retrieval (IR).} This paper presents a new Web-based method for measuring the semantic similarity between words. Different from other methods which are based on taxonomy or Search engine in Internet, our method uses snippets from Wikipedia1 to calculate the semantic similarity between words by using cosine similarity and TF-IDF.} Also, the stemmer algorithm and stop words are used in preprocessing the snippets from Wikipedia. We set different threshold to evaluate our results in order to decrease the interference from noise and redundancy. Our method was empirically evaluated using Rubenstein-Goodenough} benchmark dataset. It gives higher correlation value (with 0.615) than some existing methods. Evaluation results show that our method improves accuracy and more robust for measuring semantic similarity between words. ""
Lukosch, Stephan & Leisen, Andrea Comparing and merging versioned wiki pages 4th International Conference on Web Information Systems and Technologies, WEBIST 2008, May 4, 2008 - May 7, 2008 Funchal, Madeira, Portugal 2009 [157] Collaborative web-based applications support users when creating and sharing information. Wikis are prominent examples for that kind of applications. Wikis, like e.g. Wikipedia [1], attract loads of users that modify its content. Normally, wikis do not employ any mechanisms to avoid parallel modification of the same page. As result, conflicting changes can occur. Most wikis record all versions of a page to allow users to review recent changes. However, just recording all versions does not guarantee that conflicting modifications are reflected in the most recent version of a page. In this paper, we identify the requirements for efficiently dealing with conflicting modifications and present a web-based tool which allows to compare and merge different versions of a wiki page. 2009 Springer Berlin Heidelberg.
Lukosch, Stephan & Leisen, Andrea Dealing with conflicting modifications in a Wiki WEBIST 2008 - 4th International Conference on Web Information Systems and Technologies, May 4, 2008 - May 7, 2008 Funchal, Madeira, Portugal 2008 Collaborative web-based applications support users when creating and sharing information. Wikis are prominent examples for that kind of applications. Wikis, like e.g. Wikipedia (Wikipedia, 2007), attract loads of users that modify its content. Normally, wikis do not employ any mechanisms to avoid parallel modification of the same page. As result, conflicting changes can occur. Most wikis record all versions of a page to allow users to review recent changes. However, just recording all versions does not guarantee that conflicting modifications are reflected in the most recent version of a page. In this paper, we identify the requirements for efficiently dealing with conflicting modifications and present a web-based tool which allows to compare and merge different versions of a wiki page.
Mansour, Osama Group Intelligence: A distributed cognition perspective International Conference on Intelligent Networking and Collaborative Systems, INCoS 2009, November 4, 2009 - November 6, 2009 Barcelona, Spain 2009 [158] The question of whether intelligence can be attributed to groups or not has been raised in many scientific disciplines. In the field of computer-supported collaborative learning, this question has been examined to understand how computer-mediated environments can augment human cognition and learning on a group level. The era of social computing which represents the emergence of Web 2.0 collaborative technologies and social media has stimulated a wide discussion about collective intelligence and the global brain. This paper reviews the theory of distributed cognition in the light of these concepts in an attempt to analyze and understand the emergence process of intelligence that takes place in the context of computer-mediated collaborative and social media environments. It concludes by showing that the cognitive organization, which occurs within social interactions serves as a catalyst for intelligence to emerge on a group level. Also a process model has been developed to show the process of collaborative knowledge construction in Wikipedia that characterizes such cognitive organization. ""
Mataoui, M'hamed; Boughanem, Mohand & Mezghiche, Mohamed Experiments on PageRank algorithm in the XML information retrieval context 2nd International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2009, August 4, 2009 - August 6, 2009 London, United kingdom 2009 [159] In this paper we present two adaptations of the PageRank} algorithm to collections of XML} documents and the experimental results obtained for the Wikipedia collection used at INEX1} 2007. These adaptations to which we referred as DOCRANK} and TOPICAL-docrank"} allow the re-rank of the results returned by the base run execution to improve retrieval quality. Our experiments are performed on the results returned by the three best ranked systems in the {"Focused"} task of INEX} 2007. Evaluations have shown improvements in the quality of retrieval results (improvement of some topics is very significant eg: topic 491 topic 521 etc.). The best improvement achieved in the results returned by the DALIAN2} university system (global rate obtained for the 107 topics of INEX} 2007) was about 3.78\%. """
Maureen; Sun, Aixin; Lim, Ee-Peng; Datta, Anwitaman & Chang, Kuiyu On visualizing heterogeneous semantic networks from multiple data sources 11th International Conference on Asian Digital Libraries, ICADL 2008, December 2, 2008 - December 5, 2008 Bali, Indonesia 2008 [160] In this paper, we focus on the visualization of heterogeneous semantic networks obtained from multiple data sources. A semantic network comprising a set of entities and relationships is often used for representing knowledge derived from textual data or database records. Although the semantic networks created for the same domain at different data sources may cover a similar set of entities, these networks could also be very different because of naming conventions, coverage, view points, and other reasons. Since digital libraries often contain data from multiple sources, we propose a visualization tool to integrate and analyze the differences among multiple social networks. Through a case study on two terrorism-related semantic networks derived from Wikipedia and Terrorism Knowledge Base (TKB) respectively, the effectiveness of our proposed visualization tool is demonstrated. 2008 Springer Berlin Heidelberg.
Minier, Zsolt; Bodo, Zalan & Csato, Lehel Wikipedia-based Kernels for text categorization 9th International Symposium on Symbolic and Numeric lgorithms for Scientific Computing, SYNASC 2007, September 26, 2007 - September 29, 2007 Timisoara, Romania 2007 [161] In recent years several models have been proposed for text categorization. Within this, one of the widely applied models is the vector space model (VSM), where independence between indexing terms, usually words, is assumed. Since training corpora sizes are relatively small - compared to what would be required for a realistic number of words - the generalization power of the learning algorithms is low. It is assumed that a bigger text corpus can boost the representation and hence the learning process. Based on the work of Gabrilovich and Markovitch [6], we incorporate Wikipedia articles into the system to give word distributional representation for documents. The extension with this new corpus causes dimensionality increase, therefore clustering of features is needed. We use Latent Semantic Analysis (LSA), Kernel Principal Component Analysis (KPCA) and Kernel Canonical Correlation Analysis (KCCA) and present results for these experiments on the Reuters corpus. ""
Mishra, Surjeet & Ghosh, Hiranmay Effective visualization and navigation in a multimedia document collection using ontology 3rd International Conference on Pattern Recognition and Machine Intelligence, PReMI 2009, December 16, 2009 - December 20, 2009 New Delhi, India 2009 [162] We present a novel user interface for visualizing and navigating in a multimedia document collection. Domain ontology has been used to depict the background knowledge organization and map the multimedia information nodes on that knowledge map, thereby making the implicit knowledge organization in a collection explicit. The ontology is automatically created by analyzing the links in Wikipedia, and is delimited to tightly cover the information nodes in the collection. We present an abstraction of the knowledge map for creating a clear and concise view, which can be progressively 'zoomed in' or 'zoomed out' to navigate the knowledge space. We organize the graph based on mutual similarity scores between the nodes for aiding the cognitive process during navigation. 2009 Springer-Verlag} Berlin Heidelberg.
Missen, Malik Muhammad Saad; Boughanem, Mohand & Cabanac, Guillaume Using passage-based language model for opinion detection in blogs 25th Annual ACM Symposium on Applied Computing, SAC 2010, March 22, 2010 - March 26, 2010 Sierre, Switzerland 2010 [163] In this work, we evaluate the importance of Passages in blogs especially when we are dealing with the task of Opinion Detection. We argue that passages are basic building blocks of blogs. Therefore, we use Passage-Based} Language Modeling approach as our approach for Opinion Finding in Blogs. Our decision to use Language Modeling (LM) in this work is totally based on the performance LM} has given in various Opinion Detection Approaches. In addition to this, we propose a novel method for bi-dimensional Query Expansion with relevant and opinionated terms using Wikipedia and Relevance-Feedback} mechanism respectively. We also compare the impacts of two different query terms weighting (and ranking) approaches on final results. Besides all this, we also compare the performance of three Passage-based document ranking functions (Linear, Avg, Max). For evaluation purposes, we use the data collection of TREC} Blog06 with 50 topics of TREC} 2006 over TREC} provided best baseline with opinion finding MAP} of 0.3022. Our approach gives a MAP} improvement of almost 9.29\% over best TREC} provided baseline (baseline4). ""
Mlgaard, Lasse L.; Larsen, Jan & Goutte, Cyril Temporal analysis of text data using latent variable models Machine Learning for Signal Processing XIX - 2009 IEEE Signal Processing Society Workshop, MLSP 2009, September 2, 2009 - September 4, 2009 Grenoble, France 2009 [164] Detecting and tracking of temporal data is an important task in multiple applications. In this paper we study temporal text mining methods for Music Information Retrieval. We compare two ways of detecting the temporal latent semantics of a corpus extracted from Wikipedia, using a stepwise Probabilistic Latent Semantic Analysis (PLSA) approach and a global multiway PLSA} method. The analysis indicates that the global analysis method is able to identify relevant trends which are difficult to get using a step-by-step approach. Furthermore we show that inspection of PLSA} models with different number of factors may reveal the stability of temporal clusters making it possible to choose the relevant number of factors. ""
Mohammadi, Mehdi & GhasemAghaee, Nasser Building bilingual parallel corpora based on wikipedia 2nd International Conference on Computer Engineering and Applications, ICCEA 2010, March 19, 2010 - March 21, 2010 Indonesia 2010 [165] Aligned parallel corpora are an important resource for a wide range of multilingual researches, specifically, corpus-based machine translation. In this paper we present a Persian-English} sentence-aligned parallel corpus by mining Wikipedia. We propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that our method increase precision, while it reduce the total number of generated candidate pairs. ""
Morgan, Jonathan T.; Derthick, Katie; Ferro, Toni; Searle, Elly; Zachry, Mark & Kriplean, Travis Formalization and community investment in wikipedia's regulating texts: The role of essays 27th ACM International Conference on Design of Communication, SIGDOC'09, October 5, 2009 - October 7, 2009 Bloomington, IN, United states 2009 [166] This poster presents ongoing research on how discursive and editing behaviors are regulated on Wikipedia by means of documented rules and practices. Our analysis focuses on three types of collaboratively-created policy document (policies, guidelines and essays), that have been formalized to different degrees and represent different degrees of community investment. We employ a content analysis methodology to explore how these regulating texts differ according to a) the aspects of editor behavior, content standards and community principles that they address, and b) how they are used by Wikipedians engaged in 'talk' page discussions to inform, persuade and coordinate with one another. ""
Mozina, Martin; Giuliano, Claudio & Bratko, Ivan Argument based machine learning from examples and text 2009 1st Asian Conference on Intelligent Information and Database Systems, ACIIDS 2009, April 1, 2009 - April 3, 2009 Dong Hoi, Viet nam 2009 [167] We introduce a novel approach to cross-media learning based on argument based machine learning (ABML).} ABML} is a recent method that combines argumentation and machine learning from examples, and its main idea is to use arguments for some of the learning examples. Arguments are usually provided by a domain expert. In this paper, we present an alternative approach, where arguments used in ABML} are automatically extracted from text with a technique for relation extraction. We demonstrate and evaluate the approach through a case study of learning to classify animals by using arguments automatically extracted from Wikipedia. ""
Mulhem, Philippe & Chevallet, Jean-Pierre Use of language model, phrases and wikipedia forward links for INEX 2009 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [168] We present in this paper the work of the Information Retrieval Modeling Group (MRIM) of the Computer Science Laboratory of Grenoble (LIG) at the INEX} 2009 Ad Hoc Track. Our aim this year was to twofold: first study the impact of extracted noun phrases taken in addition to words as terms, and second using forward links present in Wikipedia to expand queries. For the retrieval, we use a language model with Dirichlet smoothing on documents and/or doxels, and using an Fetch and Browse approach we select rank the results. Our best runs according to doxel evaluation get the first rank on the Thorough task, and according to the document evaluation we get the first rank for the Focused, Relevance in Context and Best in Context tasks. 2010 Springer-Verlag} Berlin Heidelberg.
Muller, Christof & Gurevych, Iryna Using Wikipedia and Wiktionary in domain-specific information retrieval 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [169] The main objective of our experiments in the domain-specific track at CLEF} 2008 is utilizing semantic knowledge from collaborative knowledge bases such as Wikipedia and Wiktionary to improve the effectiveness of information retrieval. While Wikipedia has already been used in IR, the application of Wiktionary in this task is new. We evaluate two retrieval models, i.e. SR-Text} and SR-Word, based on semantic relatedness by comparing their performance to a statistical model as implemented by Lucene. We refer to Wikipedia article titles and Wiktionary word entries as concepts and map query and document terms to concept vectors which are then used to compute the document relevance. In the bilingual task, we translate the English topics into the document language, i.e. German, by using machine translation. For SR-Text, we alternatively perform the translation process by using cross-language links in Wikipedia, whereby the terms are directly mapped to concept vectors in the target language. The evaluation shows that the latter approach especially improves the retrieval performance in cases where the machine translation system incorrectly translates query terms. 2009 Springer Berlin Heidelberg.
Muller, Claudia; Meuthrath, Benedikt & Jeschke, Sabina Defining a universal actor content-element model for exploring social and information networks considering the temporal dynamic 2009 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2009, July 20, 2009 - July 22, 2009 Athens, Greece 2009 [170] The emergence of the Social Web offers new opportunities for scientists to explore open virtual communities. Various approaches have appeared in terms of statistical evaluation, descriptive studies and network analyses, which pursue an enhanced understanding of existing mechanisms developing from the interplay of technical and social infrastructures. Unfortunately, at the moment, all these approaches are separate and no integrated approach exists. This gap is filled by our proposal of a concept which is composed of a universal description model, temporal network definitions, and a measurement system. The approach addresses the necessary interpretation of Social Web communities as dynamic systems. In addition to the explicated models, a software tool is briefly introduced employing the specified models. Furthermore, a scenario is used where an extract from the Wikipedia database shows the practical application of the software. ""
Murugeshan, Meenakshi Sundaram; Lakshmi, K. & Mukherjee, Saswati Exploiting negative categories and wikipedia structures for document classification ARTCom 2009 - International Conference on Advances in Recent Technologies in Communication and Computing, October 27, 2009 - October 28, 2009 Kottayam, Kerala, India 2009 [171] This paper explores the effect of profile based method for classification of Wikipedia XML} documents. Our approach builds two profiles, exploiting the whole content, Initial Descriptions and links in the Wikipedia documents. For building profiles we use the negative category information which has shown to perform well for classifying unstructured texts. The performance of Cosine and Fractional Similarity metrics is also compared. The use of two classifiers and their weighted average improves the classification performance. ""
Nadamoto, Akiyo; Aramaki, Eiji; Abekawa, Takeshi & Murakami, Yohei Content hole search in community-type content using Wikipedia 11th International Conference on Information Integration and Web-based Applications and Services, iiWAS2009, December 14, 2009 - December 16, 2009 Kuala Lumpur, Malaysia 2009 [172] SNSs} and blogs, both of which are maintained by a community of people, have become popular in Web 2.0. We call these content as Community-type} content." This community is associated with the content and those who use or contribute to community-type content are considered as members of the community. Occasionally the members of a community do not understand the theme of the content from multiple viewpoints hence the amount of information is often insufficient. It is convenient to present the user missed information. In this way when Web 2.0 became popular the content on the Internet and type of users are changed. We believe that there is a need for next-generation search engines in Web 2.0. We require a search engine that can search for information users are unaware of; we call such information as "content holes." In this paper we propose a method for searching content holes in community-type content. We attempt to extract and represent content holes from discussions on SNSs} and blogs. Conventional Web search technique is generally based on similarities. On the other hand our content-hole search is a different search. In this paper we classify and represent a number of images for different searching methods; we define content holes and as the first step toward realizing our aim we propose a content-hole search system using Wikipedia. """
Nakabayashi, Takeru; Yumoto, Takayuki; Nii, Manabu; Takahashi, Yutaka & Sumiya, Kazutoshi Measuring peculiarity of text using relation between words on the web 12th International Conference on Asia-Pacific Digital Libraries, ICADL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia 2010 [173] We define the peculiarity of text as a metric of information credibility. Higher peculiarity means lower credibility. We extract the theme word and the characteristic words from text and check whether there is a subject-description relation between them. The peculiarity is defined using the ratio of the subject-description relation between a theme word and characteristic words. We evaluate the extent to which peculiarity can be used to judge by classifying text from Wikipedia and Uncyclopedia in terms of the peculiarity. ""
Nakasaki, Hiroyuki; Kawaba, Mariko; Utsuro, Takehito & Fukuhara, Tomohiro Mining cross-lingual/cross-cultural differences in concerns and opinions in blogs 22nd International Conference on Computer Processing of Oriental Languages, ICCPOL 2009, March 26, 2009 - March 27, 2009 Hong kong 2009 [174] The goal of this paper is to cross-lingually analyze multilingual blogs collected with a topic keyword. The framework of collecting multilingual blogs with a topic keyword is designed as the blog feed retrieval procedure. Mulitlingual queries for retrieving blog feeds are created from Wikipedia entries. Finally, we cross-lingually and cross-culturally compare less well known facts and opinions that are closely related to a given topic. Preliminary evaluation results support the effectiveness of the proposed framework. 2009 Springer Berlin Heidelberg.
Nakayama, Kotaro; Ito, Masahiro; Hara, Takahiro & Nishio, Shojiro Wikipedia relatedness measurement methods and influential features 2009 International Conference on Advanced Information Networking and Applications Workshops, WAINA 2009, May 26, 2009 - May 29, 2009 Bradford, United kingdom 2009 [175] As a corpus for knowledge extraction, Wikipedia has become one of the promising resources among researchers in various domains such as NLP, WWW, IR} and AI} since it has a great coverage of concepts for wide-range domain, remarkable accuracy and easy-handled structure for analysis. Relatedness measurement among concepts is one of the traditional research topics on Wikipedia analysis. The value of relatedness measurement research is widely recognized because of the wide range of applications such as query expansion in IR} and context recognition in WSD(Word} Sense Disambiguation). A number of approaches have been proposed and they proved that there are many features that can be used to measure relatedness among concepts in Wikipedia. In the past, previous researches, many features such as categories, co-occurrence of terms (links), inter-page links and Infoboxes are used to this aim. What seems lacking, however, is an integrated feature selection model for these dispersed features since it is still unclear that which feature is influential and how can we integrate them in order to achieve higher accuracy. This paper is a position paper that proposes a SVR} (Support} Vector Regression) based integrated feature selection model to investigate the influence of each feature and seek a combine model of features that achieves high accuracy and coverage. ""
Nakayama, Kotaro; Ito, Masahiro; Hara, Takahiro & Nishio, Shojiro Wikipedia mining for huge scale Japanese association thesaurus construction 22nd International Conference on Advanced Information Networking and Applications Workshops/Symposia, AINA 2008, March 25, 2008 - March 28, 2008 Gino-wan, Okinawa, Japan 2008 [176] Wikipedia, a huge scale Web-based dictionary, is an impressive corpus for knowledge extraction. We already proved that Wikipedia can be used for constructing an English association thesaurus and our link structure mining method is significantly effective for this aim. However, we want to find out how we can apply this method to other languages and what the requirements, differences and characteristics are. Nowadays, Wikipedia supports more than 250 languages such as English, German, French, Polish and Japanese. Among Asian languages, the Japanese Wikipedia is the largest corpus in Wikipedia. In this research, therefore, we analyzed all Japanese articles in Wikipedia and constructed a huge scale Japanese association thesaurus. After constructing the thesaurus, we realized that it shows several impressive characteristics depending on language and culture. ""
Nazir, Fawad & Takeda, Hideaki Extraction and analysis of tripartite relationships from Wikipedia 2008 IEEE International Symposium on Technology and Society: ISTAS '08 - Citizens, Groups, Communities and Information and Communication Technologies, June 26, 2008 - June 28, 2008 Fredericton, NB, Canada 2008 [177] Social aspects are critical in the decision making process for social actors (human beings). Social aspects can be categorized into social interaction, social communities, social groups or any kind of behavior that emerges from interlinking, overlapping or similarities between interests of a society. These social aspects are dynamic and emergent. Therefore, interlinking them in a social structure, based on bipartite affiliation network, may result in isolated graphs. The major reason is that as these correspondences are dynamic and emergent, they should be coupled with more than a single affiliation in order to sustain the interconnections during interest evolutions. In this paper we propose to interlink actors using multiple tripartite graphs rather than a bipartite graph which was the focus of most of the previous social network building techniques. The utmost benefit of using tripartite graphs is that we can have multiple and hierarchical links between social actors. Therefore in this paper we discuss the extraction, plotting and analysis methods of tripartite relations between authors, articles and categories from Wikipedia. Furthermore, we also discuss the advantages of tripartite relationships over bipartite relationships. As a conclusion of this study we argue based on our results that to build useful, robust and dynamic social networks, actors should be interlinked in one or more tripartite networks. ""
Neiat, Azadeh Ghari; Mohsenzadeh, Mehran; Forsati, Rana & Rahmani, Amir Masoud An agent- based semantic web service discovery framework 2009 International Conference on Computer Modeling and Simulation, ICCMS 2009, February 20, 2009 - February 22, 2009 Macau, China 2009 [178] Web services have changed the Web from a database of static documents to a service provider. To improve the automation of Web services interoperation, a lot of technologies are recommended, such as semantic Web services and agents. In this paper we propose a framework for semantic Web service discovery based on semantic Web services and FIPA} multi agents. This paper provides a broker which provides semantic interoperability between semantic Web service provider and agents by translating WSDL} to DF} description for semantic Web services and DF} description to WSDL} ForFIPA} multi agents. We describe how the proposed architecture analyzes the request and match search query. The ontology management in the broker creates the user ontology and merges it with general ontology (i.e. WordNet, Yago, Wikipedia ). We also describe the recommendation component that recommends the WSDL} to Web service provider to increase their retrieval probability in the related queries. ""
Neiat, Azadeh Ghari; Shavalady, Sajjad Haj; Mohsenzadeh, Mehran & Rahmani, Amir Masoud A new approach for semantic web service discovery and propagation based on agents 5th International Conference on Networking and Services, ICNS 2009, April 20, 2009 - April 25, 2009 Valencia, Spain 2009 [179] for Web based systems integration become a time challenge. To improve the automation of Web services interoperation, a lot of technologies are recommended, such as semantic Web services and agents. In this paper an approach for semantic Web service discovery and propagation based on semantic Web services and FIPA} multi agents is proposed. A broker allowing to expose semantic interoperability between semantic Web service provider and agent by translating WSDL} to DF} description for semantic Web services and vice versa is proposed . We describe how the proposed architecture analyzes the request and after being analyzed, matches or publishes the request. The ontology management in the broker creates the user ontology and merges it with general ontology (i.e. WordNet, Yago, Wikipedia ...). We also describe the recommender which analyzes the created WSDL} based on the functional and non-functional requirements and then recommends it to Web service provider to increase their retrieval probability in the related queries. ""
Netzer, Yael; Gabay, David; Adler, Meni; Goldberg, Yoav & Elhadad, Michael Ontology evaluation through text classification APWeb/WAIM 2009 International Workshops: WCMT 2009, RTBI 2009, DBIR-ENQOIR 2009, PAIS 2009, April 2, 2009 - April 4, 2009 Suzhou, China 2009 [180] We present a new method to evaluate a search ontology, which relies on mapping ontology instances to textual documents. On the basis of this mapping, we evaluate the adequacy of ontology relations by measuring their classification potential over the textual documents. This data-driven method provides concrete feedback to ontology maintainers and a quantitative estimation of the functional adequacy of the ontology relations towards search experience improvement. We specifically evaluate whether an ontology relation can help a semantic search engine support exploratory search. We test this ontology evaluation method on an ontology in the Movies domain, that has been acquired semi-automatically from the integration of multiple semi-structured and textual data sources (e.g., IMDb} and Wikipedia). We automatically construct a domain corpus from a set of movie instances by crawling the Web for movie reviews (both professional and user reviews). The 1-1 relation between textual documents (reviews) and movie instances in the ontology enables us to translate ontology relations into text classes. We verify that the text classifiers induced by key ontology relations (genre, keywords, actors) achieve high performance and exploit the properties of the learned text classifiers to provide concrete feedback on the ontology. The proposed ontology evaluation method is general and relies on the possibility to automatically align textual documents to ontology instances. 2009 Springer Berlin Heidelberg.
Newman, David; Noh, Youn; Talley, Edmund; Karimi, Sarvnaz & Baldwin, Timothy Evaluating topic models for digital libraries 10th Annual Joint Conference on Digital Libraries, JCDL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia 2010 [181] Topic models could have a huge impact on improving the ways users find and discover content in digital libraries and search interfaces, through their ability to automatically learn and apply subject tags to each and every item in a collection, and their ability to dynamically create virtual collections on the fly. However, much remains to be done to tap this potential, and empirically evaluate the true value of a given topic model to humans. In this work, we sketch out some sub-tasks that we suggest pave the way towards this goal, and present methods for assessing the coherence and inter-pretability of topics learned by topic models. Our large-scale user study includes over 70 human subjects evaluating and scoring almost 500 topics learned from collections from a wide range of genres and domains. We show how a scoring model - based on pointwise mutual information of word-pairs using Wikipedia, Google and MEDLINE} as external data sources - performs well at predicting human scores. This automated scoring of topics is an important first step to integrating topic modeling into digital libraries. ""
Nguyen, Chau Q. & Phan, Tuoi T. Key phrase extraction: A hybrid assignment and extraction approach 11th International Conference on Information Integration and Web-based Applications and Services, iiWAS2009, December 14, 2009 - December 16, 2009 Kuala Lumpur, Malaysia 2009 [182] Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP).} In this work, we propose a novel method for key phrase extracting of Vietnamese text that combines assignment and extraction approaches. We also explore NLP} techniques that we propose for the analysis of Vietnamese texts, focusing on the advanced candidate phrases recognition phase as well as part-of-speech (POS) tagging. Then we propose a method that exploits specific characteristics of the Vietnamese language and exploits the Vietnamese Wikipedia as an ontology for key phrase ambiguity resolution. Finally, we show the results of several experiments that have examined the impacts of strategies chosen for Vietnamese key phrase extracting. ""
Nguyen, Dong; Overwijk, Arnold; Hauff, Claudia; Trieschnigg, Dolf R. B.; Hiemstra, Djoerd & Jong, Franciska De WikiTranslate: Query translation for cross-lingual information retrieval using only wikipedia 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [183] This paper presents WikiTranslate, a system which performs query translation for cross-lingual information retrieval (CLIR) using only Wikipedia to obtain translations. Queries are mapped to Wikipedia concepts and the corresponding translations of these concepts in the target language are used to create the final query. WikiTranslate} is evaluated by searching with topics formulated in Dutch, French and Spanish in an English data collection. The system achieved a performance of 67\% compared to the monolingual baseline. 2009 Springer Berlin Heidelberg.
Nguyen, Hien T. & Cao, Tru H. Exploring wikipedia and text features for named entity disambiguation 2010 Asian Conference on Intelligent Information and Database Systems, ACIIDS 2010, March 24, 2010 - March 26, 2010 Hue City, Viet nam 2010 [184] Precisely identifying entities is essential for semantic annotation. This paper addresses the problem of named entity disambiguation that aims at mapping entity mentions in a text onto the right entities in Wikipedia. The aim of this paper is to explore and evaluate various combinations of features extracted from Wikipedia and texts for the disambiguation task, based on a statistical ranking model of candidate entities. Through experiments, we show which combinations of features are the best choices for disambiguation. 2010 Springer-Verlag} Berlin Heidelberg.
Nguyen, Hien T. & Cao, Tru H. Named entity disambiguation on an ontology enriched by Wikipedia RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies, July 13, 2008 - July 17, 2008 Ho Chi Minh City, Viet nam 2008 [185] Currently, for named entity disambiguation, the shortage of training data is a problem. This paper presents a novel method that overcomes this problem by automatically generating an annotated corpus based on a specific ontology. Then the corpus was enriched with new and informative features extracted from Wikipedia data. Moreover, rather than pursuing rule-based methods as in literature, we employ a machine learning model to not only disambiguate but also identify named entities. In addition, our method explores in details the use of a range of features extracted from texts, a given ontology, and Wikipedia data for disambiguation. This paper also systematically analyzes impacts of the features on disambiguation accuracy by varying their combinations for representing named entities. Empirical evaluation shows that, while the ontology provides basic features of named entities, Wikipedia is a fertile source for additional features to construct accurate and robust named entity disambiguation systems. ""
Nguyen, Thanh C.; Le, Hai M. & Phan, Tuoi T. Building knowledge base for Vietnamese information retrieval 11th International Conference on Information Integration and Web-based Applications and Services, iiWAS2009, December 14, 2009 - December 16, 2009 Kuala Lumpur, Malaysia 2009 [186] At present, Vietnamese knowledge base (vnKB) is one of the most important focuses of Vietnamese researchers because of its applications in wide areas such as Information Retrieval (IR), Machine Translation (MT) etc. There have been several separate projects developing VnKB} in various domains. The training in VnBK} is the most difficulty because of quantity and quality of training data, and lacking of available Vietnamese corpus with acceptable quality. This paper introduces an approach, which first extracts semantic information from Vietnamese Wikipedia (vnWK), then trains the proposed VnKB} by applying support vector machine (SVM) technique. The experimentation of the proposed approach shows that it is a potential solution because of its good results and proves that it can provide more valuable benefits when applying to our Vietnamese Semantic Information Retrieval system. ""
Ochoa, Xavier & Duval, Erik Measuring learning object reuse 3rd European Conference on Technology Enhanced Learning, EC-TEL 2008, September 16, 2008 - September 19, 2008 Maastricht, Netherlands 2008 [187] This paper presents a quantitative analysis of the reuse of learning objects in real world settings. The data for this analysis was obtained from three sources: Connexions' modules, University courses and Presentation components. They represent the reuse of learning objects at different granularity levels. Data from other types of reusable components, such as software libraries, Wikipedia images and Web APIs, were used for comparison purposes. Finally, the paper discusses the implications of the findings in the field of Learning Object research. 2008 Springer-Verlag} Berlin Heidelberg.
Oh, Jong-Hoon; Kawahara, Daisuke; Uchimoto, Kiyotaka; Kazama, Jun'ichi & Torisawa, Kentaro Enriching multilingual language resources by discovering missing cross-language links in Wikipedia 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, December 9, 2008 - December 12, 2008 Sydney, NSW, Australia 2008 [188] We present a novel method for discovering missing crosslanguage links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links - a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92\% precision with 78\% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately 105 missing crosslanguage links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia. ""
Ohmori, Kenji & Kunii, Tosiyasu L. The mathematical structure of cyberworlds 2007 International Conference on Cyberworlds, CW'07, October 24, 2007 - October 27, 2007 Hannover, Germany 2007 [189] The mathematical structure of cyberworlds is clarified based on the duality of homology lifting property and homotopy extension property. The duality gives bottom-up and top-down methods to model, design and analyze the structure of cyberworlds. The set of homepages representing a cyberworld is transformed into a state finite machine. In development of the cyberworld, a sequence of finite state machines is obtained. This sequence has homotopic property. This property is clarified to map a finite state machine to a simplicial complex. Wikipedia, bottom-up network construction and top-down network analysis are described as examples. ""
Okoli, Chitu A brief review of studies of Wikipedia in peer-reviewed journals 3rd International Conference on Digital Society, ICDS 2009, February 1, 2009 - February 7, 2009 Cancun, Mexico 2009 [190] Since its establishment in 2001, Wikipedia, the free encyclopedia that anyone can edit" has become a cultural icon of the unlimited possibilities of the World Wide Web. Thus it has become a serious subject of scholarly study to objectively and rigorously understand it as a phenomenon. This paper reviews studies of Wikipedia that have been published in peer-reviewed journals. Among the wealth of studies reviewed major sub-streams of research covered include: how and why Wikipedia works; assessments of the reliability of its content; using it as a data source for various studies; and applications of Wikipedia in different domains of endeavour. """
Okoli, Chitu Information product creation through open source encyclopedias ICC2009 - International Conference of Computing in Engineering, Science and Information, April 2, 2009 - April 4, 2009 Fullerton, CA, United states 2009 [191] The same open source philosophy that has been traditionally applied to software development can be applied to the collaborative creation of non-software information products, such as encyclopedias, books, and dictionaries. Most notably, the eight-year-old Wikipedia is a comprehensive general encyclopedia, comprising over 12 million articles in over 200 languages. It becomes increasingly important to rigorously investigate the workings of the open source process to understand its benefits and motivations. This paper presents a research program funded by the Social Sciences and Humanities Research Council of Canada with the following objectives: (1) Survey open source encyclopedia participants to understand their motivations for participating and their demographic characteristics, and compare them with participants in traditional open source software projects; (2) investigate the process of open source encyclopedia development in a live community to understand how their motivations interact in the open source framework to create quality information products. ""
Okoli, Chitu & Schabram, Kira Protocol for a systematic literature review of research on the Wikipedia 1st ACM International Conference on Management of Emergent Digital EcoSystems, MEDES '09, October 27, 2009 - October 30, 2009 Lyon, France 2009 [192] Context: Wikipedia has become one of the ten-most visited sites on the Web, and the world's leading source of Web reference information. Its rapid success has attracted over 1,000 scholarly studies that treat Wikipedia as a major topic or data source. Objectives: This article presents a protocol for conducting a systematic mapping (a broad-based literature review) of research on Wikipedia. It identifies what research has been conducted; what research questions have been asked, which have been answered; and what theories and methodologies have been employed to study Wikipedia. Methods: This protocol follows the rigorous methodology of evidence-based software engineering to conduct a systematic mapping study. Results and conclusions: This protocol reports a study in progress. ""
Okuoka, Tomoki; Takahashi, Tomokazu; Deguchi, Daisuke; Ide, Ichiro & Murase, Hiroshi Labeling news topic threads with wikipedia entries 11th IEEE International Symposium on Multimedia, ISM 2009, December 14, 2009 - December 16, 2009 San Diego, CA, United states 2009 [193] Wikipedia is a famous online encyclopedia. How-ever most Wikipedia entries are mainly explained by text, so it will be very informative to enhance the contents with multimedia information such as videos. Thus we are working on a method to extend information of Wikipedia entries by means of broadcast videos which explain the entries. In this work, we focus especially on news videos and Wikipedia entries about news events. In order to extend information of Wikipedia entries, it is necessary to link news videos and Wikipedia entries. So the main issue will be on a method that labels news videos with Wikipedia entries automatically. In this way, explanations could be more detailed with news videos can be exhibited, and the context of the news events should become easier to understand. Through experiments, news videos were accurately labeled with Wikipedia entries with a precision of 86\% and a recall of 79\%. ""
Olleros, F. Xavier Learning to trust the crowd: Some lessons from Wikipedia 2008 International MCETECH Conference on e-Technologies, MCETECH 2008, January 23, 2008 - January 25, 2008 Montreal, QC, Canada 2008 [194] Inspired by the open source software (OSS) movement, Wikipedia has gone further than any OSS} project in decentralizing its quality control task. This is seen by many as a fatal flaw. In this short paper, I will try to show that it is rather a shrewd and fertile design choice. First, I will describe the precise way in which Wikipedia is more decentralized than OSS} projects. Secondly, I will explain why Wikipedia's quality control can be and must be decentralized. Thirdly, I will show why it is wise for Wikipedia to welcome anonymous amateurs. Finally, I will argue that concerns about Wikipedia's quality and sustainable success have to be tempered by the fact that, as disruptive innovations tend to do, Wikipedia is in the process of redefining the pertinent dimensions of quality and value for general encyclopedias. ""
Ortega, Felipe; Gonzalez-Barahona, Jesus M. & Robles, Gregorio The top-ten wikipedias : A quantitative analysis using wikixray 2nd International Conference on Software and Data Technologies, ICSOFT 2007, July 22, 2007 - July 25, 2007 Barcelona, Spain 2007 In a few years, Wilcipedia has become one of the information systems with more public (both producers and consumers) of the Internet. Its system and information architecture is relatively simple, but has proven to be capable of supporting the largest and more diverse community of collaborative authorship worldwide. In this paper, we analyze in detail this community, and the contents it is producing. Using a quantitative methodology based on the analysis of the public Wikipedia databases, we describe the main characteristics of the 10 largest language editions, and the authors that work in them. The methodology (which is almost completely automated) is generic enough to be used on the rest of the editions, providing a convenient framework to develop a complete quantitative analysis of the Wikipedia. Among other parameters, we study the evolution of the number of contributions and articles, their size, and the differences in contributions by different authors, inferring some relationships between contribution patterns and content. These relationships reflect (and in part, explain) the evolution of the different language editions so far, as well as their future trends.
Otjacques, Benoit; Cornil, Mael & Feltz, Fernand Visualizing cooperative activities with ellimaps: The case of wikipedia 6th International Conference on Cooperative Design, Visualization, and Engineering, CDVE 2009, September 20, 2009 - September 23, 2009 Luxembourg, Luxembourg 2009 [195] Cooperation has become a key word in the emerging Web 2.0 paradigm. The nature and motivations of the various behaviours related to this type of cooperative activities remain however incompletely understood. The information visualization tools can play a crucial role from this perspective to analyse the collected data. This paper presents a prototype allowing visualizing some data about the Wikipedia history with a technique called ellimaps. In this context the recent CGD} algorithm is used in order to increase the scalability of the ellimaps approach. 2009 Springer Berlin Heidelberg.
Overell, Simon; Sigurbjornsson, Borkur & Zwol, Roelof Van Classifying tags using open content resources 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, February 9, 2009 - February 12, 2009 Barcelona, Spain 2009 [196] Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, dates, people and other associated meta-data. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet} categories as our classification schema and ground truth. Two structural patterns found in Wikipedia are used for training and classification: categories and templates. We apply our system to classifying Flickr tags. Compared to a WordNet} baseline our method increases the coverage of the Flickr vocabulary by 115\%. We can classify many important entities that are not covered by WordNet, such as, London Eye, Big Island, Ronaldinho, geo-caching and wii. ""
Ozyurt, I. Burak A large margin approach to anaphora resolution for neuroscience knowledge discovery 22nd International Florida Artificial Intelligence Research Society Conference, FLAIRS-22, March 19, 2009 - March 21, 2009 Sanibel Island, FL, United states 2009 A discriminative large margin classifier based approach to anaphora resolution for neuroscience abstracts is presented. The system employs both syntactic and semantic features. A support vector machine based word sense disambiguation method combining evidence from three methods, that use WordNet} and Wikipedia, is also introduced and used for semantic features. The support vector machine anaphora resolution classifier with probabilistic outputs achieved almost four-fold improvement in accuracy over the baseline method. Copyright 2009, Assocation for the Advancement of ArtdicaI} Intelligence (www.aaai.org). All rights reserved.
Pablo-Sanchez, Cesar De; Martinez-Fernandez, Jose L.; Gonzalez-Ledesma, Ana; Samy, Doaa; Martinez, Paloma; Moreno-Sandoval, Antonio & Al-Jumaily, Harith Combining wikipedia and newswire texts for question answering in spanish 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [197] This paper describes the adaptations of the MIRACLE} group QA} system in order to participate in the Spanish monolingual question answering task at QA@CLEF} 2007. A system, initially developed for the EFE} collection, was reused for Wikipedia. Answers from both collections were combined using temporal information extracted from questions and collections. Reusing the EFE} subsystem has proven not feasible, and questions with answers only in Wikipedia have obtained low accuracy. Besides, a co-reference module based on heuristics was introduced for processing topic-related questions. This module achieves good coverage in different situations but it is hindered by the moderate accuracy of the base system and the chaining of incorrect answers. 2008 Springer-Verlag} Berlin Heidelberg.
Panchal, Jitesh H. & Fathianathan, Mervyn Product realization in the age of mass collaboration 2008 ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, DETC 2008, August 3, 2008 - August 6, 2008 New York City, NY, United states 2009 There has been a recent emergence of communities working together in large numbers to develop new products, services, and systems. Collaboration at such scales, referred to as mass collaboration, has resulted in various robust products including Linux and Wikipedia. Companies are also beginning to utilize the power of mass collaboration to foster innovation at various levels. Business models based on mass collaboration are also emerging. Such an environment of mass collaboration brings about significant opportunities and challenges for designing next generation products. The objectives in this paper are to discuss these recent developments in the context of engineering design and to identify new research challenges. The recent trends in mass collaboration are discussed and the impacts of these trends on product realization processes are presented. Traditional collaborative product realization is distinguished from mass collaborative product realization. Finally, the open research issues for successful implementation of mass collaborative product realization are discussed.
Panciera, Katherine; Priedhorsky, Reid; Erickson, Thomas & Terveen, Loren Lurking? Cyclopaths? A quantitative lifecycle analysis of user behavior in a geowiki 28th Annual CHI Conference on Human Factors in Computing Systems, CHI 2010, April 10, 2010 - April 15, 2010 Atlanta, GA, United states 2010 [198] Online communities produce rich behavioral datasets, e.g., Usenet news conversations, Wikipedia edits, and Facebook friend networks. Analysis of such datasets yields important insights (like the long tail" of user participation) and suggests novel design interventions (like targeting users with personalized opportunities and work requests). However certain key user data typically are unavailable specifically viewing pre-registration and non-logged-in activity. The absence of data makes some questions hard to answer; ac- cess to it can strengthen extend or cast doubt on previous results. We report on analysis of user behavior in Cyclopath a geographic wiki and route-finder for bicyclists. With access to viewing and non-logged-in activity data we were able to: (a) replicate and extend prior work on user lifecycles in Wikipedia (b) bring to light some pre-registration activity thus testing for the presence of "educational lurking and (c) demonstrate the locality of geographic activity and how editing and viewing are geographically correlated.
Pang, Wenbo & Fan, Xiaozhong Inducing gazetteer for Chinese named entity recognition based on local high-frequent strings 2009 2nd International Conference on Future Information Technology and Management Engineering, FITME 2009, December 13, 2009 - December 14, 2009 Sanya, China 2009 [199] Gazetteers, or entity dictionaries, are important for named entity recognition (NER).} Although the dictionaries extracted automatically by the previous methods from a corpus, web or Wikipedia are very huge, they also misses some entities, especially the domain-specific entities. We present a novel method of automatic entity dictionary induction, which is able to construct a dictionary more specific to the processing text at a much lower computational cost than the previous methods. It extracts the local high-frequent strings in a document as candidate entities, and filters the invalid candidates with the accessor variety (AV) as our entity criterion. The experiments show that the obtained dictionary can effectively improve the performance of a high-precision baseline of NER.} ""
Paolucci, Alessio Research summary: Intelligent Natural language processing techniques and tools 25th International Conference on Logic Programming, ICLP 2009, July 14, 2009 - July 17, 2009 Pasadena, CA, United states 2009 [200] My research path started with my master thesis (supervisor Prof. Stefania Costantini) about a neurobiologically-inspired proposal in the field of natural language processing. In more detail, we proposed the Semantic} Enhanced DCGs"} (for short SE-DCGs) extension to the well-known DCG's} to allow for parallel syntactic and semantic analysis and generate semantically-based description of the sentence at hand. The analysis carried out through SE-DCG's} was called "syntactic-semantic fully informed analysis" and it was designed to be as close as possible (at least in principle) to the results in the context of neuroscience that I had revised and studied. As proof-of-concept I implemented the prototype of semantic search engine the Mnemosine system. Mnemosine is able to interact with a user in natural language and to provide contextual answer at different levels of detail. Mnemosine has been applied to a practical case-study i.e. to the WikiPedia} Web pages. A brief overview of this work was presented during CICL} 08 [1]. 2009 Springer Berlin Heidelberg."
Pedersen, Claus Vesterager Who are the oracles - Is Web 2.0 the fulfilment of our dreams?: Host lecture at the EUSIDIC Annual Conference 11-13 March 2007 at Roskilde University Information Services and Use 2007 Powerful web-services will enable integration with Amazon, Library Thing, Google etcetera and it will make it feasible to construct new applications in very few days rather than the usual months or even years. The fundamental objective for modern university libraries is to create interfaces with the global knowledge system, tailor-made to the individual profile and needs of each university, department, researcher, and student. University libraries must support and use collaborative working and learning spaces and must be able to filter information and make it context relevant and reliant. Wikipedia is a good example of collaborative work between non-professionals, non-specialists, nonscientific volunteers with a fine result. Filtering information and making it context relevant and reliant are of very high importance, not only to the students and their education processes but also in connection with science and the scientific processes at the university.
Pei, Minghua; Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro Constructing a global ontology by concept mapping using Wikipedia thesaurus 22nd International Conference on Advanced Information Networking and Applications Workshops/Symposia, AINA 2008, March 25, 2008 - March 28, 2008 Gino-wan, Okinawa, Japan 2008 [201] Recently, the importance of semantics on the WWW} is widely recognized and a lot of semantic information (RDF, OWL} etc.) is being built/published on the WWW.} However, the lack of ontology mappings becomes a serious problem for the Semantic Web since it needs well defined relations to retrieve information correctly by inferring the meaning of information. One to one mapping is not an efficient method due to the nature of distributed environment. Therefore, it would be a considerable method to map the concepts by using a large-scale intermediate ontology. On the other hand, Wikipedia is a large-scale of concept network covering almost all concepts in the real world. In this paper, we propose an intermediate ontology construction method using Wikipedia Thesaurus, an association thesaurus extracted from Wikipedia. Since Wikipedia Thesaurus provides associated concepts without explicit relation type, we propose an approach of concept mapping using two sub methods; name mapping" and "logic-based mapping". """
Pereira, Francisco; Alves, Ana; Oliveirinha, Joo & Biderman, Assaf Perspectives on semantics of the place from online resources ICSC 2009 - 2009 IEEE International Conference on Semantic Computing, September 14, 2009 - September 16, 2009 Berkeley, CA, United states 2009 [202] We present a methodology for extraction of semantic indexes related to a given geo-referenced place. These lists of words correspond to the concepts that should be semantically related to that place, according to a number of perspectives. Each perspective is provided by a different online resource, namely upcoming.org, Flickr, Wikipedia or open web search (using Yahoo! search engine). We describe the process by which those lists are obtained, present experimental results and discuss the strengths and weaknesses of the methodology and of each perspective. ""
Pilato, Giovanni; Augello, Agnese; Scriminaci, Mario; Vassallo, Giorgio & Gaglio, Salvatore Sub-symbolic mapping of cyc microtheories in data-driven conceptual" spaces" 11th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2007, and 17th Italian Workshop on Neural Networks, WIRN 2007, September 12, 2007 - September 14, 2007 Vietri sul Mare, Italy 2007 The presented work aims to combine statistical and cognitive-oriented approaches with symbolic ones so that a conceptual similarity relationship layer can be added to a Cyc KB} microtheory. Given a specific microtheory, a LSA-inspired} conceptual space is inferred from a corpus of texts created using both ad hoc extracted pages from the Wikipedia repository and the built-in comments about the concepts of the specific Cyc microtheory. Each concept is projected in the conceptual space and the desired layer of subsymbolic relationships between concepts is created. This procedure can help a user in finding the concepts that are sub-symbolically conceptually related" to a new concept that he wants to insert in the microtheory. Experimental results involving two Cyc microtheories are also reported. Springer-Verlag} Berlin Heidelberg 2007."
Pinkwart, Niels Applying Web 2.0 design principles in the design of cooperative applications 5th International Conference on Cooperative Design, Visualization, and Engineering, CDVE 2008, September 22, 2008 - September 25, 2008 Calvia, Mallorca, Spain 2008 [203] Web} 2.0" is a term frequently mentioned in media - apparently applications such as Wikipedia Social Network Services Online Shops with integrated recommender systems or Sharing Services like flickr all of which rely on user's activities contributions and interactions as a central factor are fascinating for the general public. This leads to a success of these systems that seemingly exceeds the impact of most "traditional" groupware applications that have emerged from CSCW} research. This paper discusses differences and similarities between novel Web 2.0 tools and more traditional CSCW} application in terms of technologies system design and success factors. Based on this analysis the design of the cooperative learning application LARGO} is presented to illustrate how Web 2.0 success factors can be considered for the design of cooperative environments. 2008 Springer-Verlag} Berlin Heidelberg."
Pirrone, Roberto; Pipitone, Arianna & Russo, Giuseppe Semantic sense extraction from Wikipedia pages 3rd International Conference on Human System Interaction, HSI'2010, May 13, 2010 - May 15, 2010 Rzeszow, Poland 2010 [204] This paper discusses a modality to access and to organize unstructured contents related to a particular topic coming from the access to Wikipedia pages. The proposed approach is focused on the acquisition of new knowledge from Wikipedia pages and is based on the definition of useful patterns able to extract and identify novel concepts and relations to be added in the knowledge base. We proposes a method that uses information from the wiki page's structure. According to the different part of the page we define different strategies to obtain new concepts or relation between them. We analyze not only structure but text directly to obtain relations and concepts and to extract the type of relations to be incorporated in a domain ontology. The purpose is to use the obtained information in an intelligent tutoring system to improve his capabilities in dialogue management with users. ""
Popescu, Adrian; Borgne, Herve Le & Moellic, Pierre-Alain Conceptual image retrieval over a large scale database 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [205] Image retrieval in large-scale databases is currently based on a textual chains matching procedure. However, this approach requires an accurate annotation of images, which is not the case on the Web. To tackle this issue, we propose a reformulation method that reduces the influence of noisy image annotations. We extract a ranked list of related concepts for terms in the query from WordNet} and Wikipedia, and use them to expand the initial query. Then some visual concepts are used to re-rank the results for queries containing, explicitly or implicitly, visual cues. First evaluations on a diversified corpus of 150000 images were convincing since the proposed system was ranked 4 th and 2 nd at the WikipediaMM} task of the ImageCLEF} 2008 campaign [1]. 2009 Springer Berlin Heidelberg.
Popescu, Adrian; Grefenstette, Gregory & Moellic, Pierre-Alain Gazetiki: Automatic creation of a geographical gazetteer 8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008, JCDL'08, June 16, 2008 - June 20, 2008 Pittsburgh, PA, United states 2008 [206] Geolocalized databases are becoming necessary in a wide variety of application domains. Thus far, the creation of such databases has been a costly, manual process. This drawback has stimulated interest in automating their construction, for example, by mining geographical information from the Web. Here we present and evaluate a new automated technique for creating and enriching a geographical gazetteer, called Gazetiki. Our technique merges disparate information from Wikipedia, Panoramio, and web search, engines in order to identify geographical names, categorize these names, find their geographical coordinates and rank them. The information produced in Gazetiki enhances and complements the Geonames database, using a similar domain model. We show that our method provides a richer structure and an improved coverage compared to another known attempt at automatically building a geographic database and, where possible, we compare our Gazetiki to Geonames. ""
Prasarnphanich, Pattarawan & Wagner, Christian Creating critical mass in collaboration systems: Insights from wikipedia 2008 2nd IEEE International Conference on Digital Ecosystems and Technologies, IEEE-DEST 2008, February 26, 2008 - February 29, 2008 Phitsanulok, Thailand 2008 [207] Digital ecosystems that rely on peer production, where users are consumers as well as producers of information and knowledge, are becoming increasingly popular and viable. Supported by Web 2.0 technologies such as wikis, these systems have the potential to replace existing knowledge management systems which generally rely on a small group of experts. The fundamental question for all such systems is under which conditions, the collective acts of knowledge contribution are started and become self-sustaining? Our article addresses this question, using Wikipedia as an exemplary system. Through a collective action framework, we apply critical mass theory to explain emergence and sustainability of the peer production approach. ""
Prato, Andrea & Ronchetti, Marco Using Wikipedia as a reference for extracting semantic information from a text 3rd International Conference on Advances in Semantic Processing - SEMAPRO 2009, October 11, 2009 - October 16, 2009 Sliema, Malta 2009 [208] In this paper we present an algorithm that, using Wikipedia as a reference, extracts semantic information from an arbitrary text. Our algorithm refines a procedure proposed by others, which mines all the text contained in the whole Wikipedia. Our refinement, based on a clustering approach, exploits the semantic information contained in certain types of Wikipedia hyperlinks, and also introduces an analysis based on multi-words. Our algorithm outperforms current methods in that the output contains many less false positives. We were also able to understand which (structural) part of the texts provides most of the semantic information extracted by the algorithm. ""
Preminger, Michael; Nordlie, Ragnar & Pharo, Nils OUC's participation in the 2009 INEX book track 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [209] In this article we describe the Oslo University College's participation in the INEX} 2009 Book track. This year's tasks have been featuring complex topics, containing aspects. These lend themselves to use in both the book retrieval and the focused retrieval tasks. The OUC} has submitted retrieval results for both tasks, focusing on using the Wikipedia texts for query expansion, as well as utilizing chapter division information in (a number of) the books. 2010 Springer-Verlag} Berlin Heidelberg.
Priedhorsky, Reid; Chen, Jilin; Lam, Shyong K.; Panciera, Katherine; Terveen, Loren & Riedl, John Creating, destroying, and restoring value in wikipedia 2007 International ACM Conference on Supporting Group Work, GROUP'07, November 4, 2007 - November 7, 2007 Sanibel Island, FL, United states 2007 [210] Wikipedia's brilliance and curse is that any user can edit any of the encyclopedia entries. We introduce the notion of the impact of an edit, measured by the number of times the edited version is viewed. Using several datasets, including recent logs of all article views, we show that an overwhelming majority of the viewed words were written by frequent editors and that this majority is increasing. Similarly, using the same impact measure, we show that the probability of a typical article view being damaged is small but increasing, and we present empirically grounded classes of damage. Finally, we make policy recommendations for Wikipedia and other wikis in light of these findings. ""
Pu, Qiang; He, Daqing & Li, Qi Query expansion for effective geographic information retrieval 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [211] We developed two methods for monolingual Geo-CLEF} 2008 task. The GCEC} method aims to test the effectiveness of our online geographic coordinates extraction and clustering algorithm, and the WIKIGEO} method wants to examine the usefulness of using the geographic coordinates information in Wikipedia for identifying geo-locations. We proposed a measure of topic distance to evaluate these two methods. The experiments results show that: 1) our online geographic coordinates extraction and clustering algorithm is useful for the type of locations that do not have clear corresponding coordinates; 2) the expansion based on the geo-locations generated by GCEC} is effective in improving geographic retrieval; 3) Wikipedia can help in finding the coordinates for many geo-locations, but its usage for query expansion still needs further study; 4) query expansion based on title only obtained better results than that on the title and narrative parts, even though the latter contains more related geographic information. Further study is needed for this part. 2009 Springer Berlin Heidelberg.
Puttaswamy, Krishna P.N.; Marshall, Catherine C.; Ramasubramanian, Venugopalan; Stuedi, Patrick; Terry, Douglas B. & Wobber, Ted Docx2Go: Collaborative editing of fidelity reduced documents on mobile devices 8th Annual International Conference on Mobile Systems, Applications and Services, MobiSys 2010, June 15, 2010 - June 18, 2010 San Francisco, CA, United states 2010 [212] Docx2Go} is a new framework to support editing of shared documents on mobile devices. Three high-level requirements influenced its design - namely, the need to adapt content, especially textual content, on the fly according to the quality of the network connection and the form factor of each device; support for concurrent, uncoordinated editing on different devices, whose effects will later be merged on all devices in a convergent and consistent manner without sacrificing the semantics of the edits; and a flexible replication architecture that accommodates both device-to-device and cloudmediated synchronization. Docx2Go} supports on-the-go editing for XML} documents, such as documents in Microsoft Word and other commonly used formats. It combines the best practices from content adaptation systems, weakly consistent replication systems, and collaborative editing systems, while extending the state of the art in each of these fields. The implementation of Docx2Go} has been evaluated based on a workload drawn from Wikipedia. ""
Qiu, Qiang; Zhang, Yang; Zhu, Junping & Qu, Wei Building a text classifier by a keyword and Wikipedia knowledge 5th International Conference on Advanced Data Mining and Applications, ADMA 2009, August 17, 2009 - August 19, 2009 Beijing, China 2009 [213] Traditional approach for building text classifiers usually require a lot of labeled documents, which are expensive to obtain. In this paper, we propose a new text classification approach based on a keyword and Wikipedia knowledge, so as to avoid labeling documents manually. Firstly, we retrieve a set of related documents about the keyword from Wikipedia. And then, with the help of related Wikipedia pages, more positive documents are extracted from the unlabeled documents. Finally, we train a text classifier with these positive documents and unlabeled documents. The experiment result on {20Newsgroup} dataset show that the proposed approach performs very competitively compared with NB-SVM, a PU} learner, and NB, a supervised learner. 2009 Springer.
Ramanathan, Madhu; Rajagopal, Srikant; Karthik, Venkatesh; Murugeshan, Meenakshi Sundaram & Mukherjee, Saswati A recursive approach to entity ranking and list completion using entity determining terms, qualifiers and prominent n-grams 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [214] This paper presents our approach for INEX} 2009 Entity Ranking track which consists of two subtasks viz. Entity Ranking and List Completion. Retrieving the correct entities according to the user query is a three-step process viz. extracting the required information from the query and the provided categories, extracting the relevant documents which may be either prospective entities or intermediate pointers to prospective entities by making use of the structure available in the Wikipedia Corpus and finally ranking the resultant set of documents. We have extracted the Entity Determining Terms (EDTs), Qualifiers and prominent n-grams from the query, strategically exploited the relation between the extracted terms and the structure and connectedness of the corpus to retrieve links which are highly probable of being entities and then used a recursive mechanism for retrieving relevant documents through the Lucene Search. Our ranking mechanism combines various approaches that make use of category information, links, titles and WordNet} information, initial description and the text of the document. 2010 Springer-Verlag} Berlin Heidelberg.
Ramezani, Maryam & Witschel, Hans Friedrich An intelligent system for semi-automatic evolution of ontologies 2010 IEEE International Conference on Intelligent Systems, IS 2010, July 7, 2010 - July 9, 2010 London, United kingdom 2010 [215] Ontologies are an important part of the Semantic Web as well as of many intelligent systems. However, the traditional expert-driven development of ontologies is time-consuming and often results in incomplete and inappropriate ontologies. In addition, since ontology evolution is not controlled by end users, it may take too long for a conceptual change in the domain to be reflected in the ontology. In this paper, we present a recommendation algorithm in a Web 2.0 platform that supports end users to collaboratively evolve ontologies by suggesting semantic relations between new and existing concepts. We use the Wikipedia category hierarchy to evaluate our algorithm and our experimental results show that the proposed algorithm produces high quality recommendations. ""
Ramirez, Alex; Ji, Shaobo; Riordan, Rob; Ulbrich, Frank & Hine, Michael J. Empowering business students: Using Web 2.0 tools in the classroom 2nd International Conference on Computer Supported Education, CSEDU 2010, April 7, 2010 - April 10, 2010 Valencia, Spain 2010 This paper discusses the design of a course to empower business students using Web 2.0 technologies. We explore the learning phenomenon as a way to bring forward a process of continuous improvement supported by social software. We develop a framework to assess the infrastructure against expectations of skill proficiency using Web 2.0 tools which must emerge as a result of registering in an introductory business information and communication technologies (ICT) course in a business school of a Canadian university. We use Friedman's (2007) thesis that the world is flat" to discuss issues of globalization and the role of ICT.} Students registered in the course are familiar with some of the tools we introduce and use in the course. The students are members of Facebook or MySpace} regularly check YouTube} and use Wikipedia in their studies. They use these tools to socialize. We broaden the students' horizons and explore the potential business benefits of such tools and empower the students to use Web 2.0 technologies within a business context."
Rao, Weixiong; Fu, Ada Wai-Chee; Chen, Lei & Chen, Hanhua Stairs: Towards efficient full-text filtering and dissemination in a DHT environment 25th IEEE International Conference on Data Engineering, ICDE 2009, March 29, 2009 - April 2, 2009 Shanghai, China 2009 [216] Nowadays contents in Internet like weblogs, wikipedia and news sites become live". How to notify and provide users with the relevant contents becomes a challenge. Unlike conventional Web search technology or the RSS} feed this paper envisions a personalized full-text content filtering and dissemination system in a highly distributed environment such as a Distributed Hash Table (DHT).} Users can subscribe to their interested contents by specifying some terms and threshold values for filtering. Then published contents will be disseminated to the associated Subscribers.We} propose a novel and simple framework of filter registration and content publication STAIRS.} By the new framework we propose three algorithms (default forwarding dynamic forwarding and adaptive forwarding) to reduce the forwarding cost and false dismissal rate; meanwhile the subscriber can receive the desired contents with no duplicates. In particular the adaptive forwarding utilizes the filter information to significantly reduce the forwarding cost. Experiments based on two real query logs and two real datasets show the effectiveness of our proposed framework. """
Ray, Santosh Kumar; Singh, Shailendra & Joshi, B.P. World wide web based question answering system - A relevance feedback framework for automatic answer validation 2nd International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2009, August 4, 2009 - August 6, 2009 London, United kingdom 2009 [217] An open domain question answering system is one of the emerging information retrieval systems available on the World Wide Web that is becoming popular day by day to get succinct and relevant answers in response of users' questions. The validation of the correctness of the answer is an important issue in the field of question answering. In this paper, we are proposing a World Wide Web based solution for answer validation where answers returned by open domain Question Answering Systems can be validated using online resources such as Wikipedia and Google. We have applied several heuristics for answer validation task and tested them against some popular World Wide Web based open domain Question Answering Systems over a collection of 500 questions collected from standard sources such as TREC, the Worldbook, and the Worldfactbook. We found that the proposed method is yielding promising results for automatic answer validation task. ""
Razmara, Majid & Kosseim, Leila A little known fact is... Answering other questions using interest-markers 8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007, Febrary 18, 2007 - Febrary 24, 2007 Mexico City, Mexico 2007 In this paper, we present an approach to answering Other"} questions using the notion of interest marking terms. {"Other"} questions have been introduced in the TREC-QA} track to retrieve other interesting facts about a topic. To answer these types of questions our system extracts from Wikipedia articles a list of interest-marking terms related to the topic and uses them to extract and score sentences from the document collection where the answer should be found. Sentences are then re-ranked using universal interest-markers that are not specific to the topic. The top sentences are then returned as possible answers. When using the 2004 TREC} data for development and 2005 data for testing the approach achieved an F-score of 0.265 placing it among the top systems. Springer-Verlag} Berlin Heidelberg 2007."
Reinoso, Antonio J.; Gonzalez-Barahona, Jesus M.; Robles, Gregorio & Ortega, Felipe A quantitative approach to the use of the wikipedia IEEE Symposium on Computers and Communications 2009, ISCC 2009, July 5, 2009 - July 8, 2009 Sousse, Tunisia 2009 [218] This paper presents a quantitative study of the use of the Wikipedia system by its users (both readers and editors),with special focus on the identification of time and kind of- use patterns, characterization of traffic and workload, and comparative analysis of different language editions. The basis of the study is the filtering and analysis of a large sample of the requests directed to the Wikimedia systems for six weeks, each in a month from November 2007 to April 2008. In particular, we have considered the twenty most frequently visited language editions of the Wikipedia, identifying for each access to any of them the corresponding namespace (sets of resources with uniform semantics), resource name (article names, for example) and action (editions, submissions, history reviews, save operations, etc.). The results found include the identification of weekly and daily patterns, and several correlations between several actions on the articles. In summary, the study shows an overall picture ofhow the most visited language editions of the Wikipedia are being accessed by their users. ""
Ren, Reede; Misra, Hemant & Jose, Joemon M. Semantic based adaptive movie summarisation 16th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM 2010, October 6, 2010 - October 8, 2010 Chongqing, China 2009 [219] This paper proposes a framework for automatic video summarization by exploiting internal and external textual descriptions. The web knowledge base Wikipedia is used as a middle media layer, which bridges the gap between general user descriptions and exact film subtitles. Latent Dirichlet Allocation (LDA) detects as well as matches the distribution of content topics in Wikipedia items and movie subtitles. A saliency based summarization system then selects perceptually attractive segments from each content topic for summary composition. The evaluation collection consists of six English movies and a high topic coverage is shown over official trails from the Internet Movie Database. 2010 Springer-Verlag} Berlin Heidelberg.
Riche, Nathalie Henry; Lee, Bongshin & Chevalier, Fanny IChase: Supporting exploration and awareness of editing activities on Wikipedia International Conference on Advanced Visual Interfaces, AVI '10, May 26, 2010 - May 28, 2010 Rome, Italy 2010 [220] To increase its credibility and preserve the trust of its readers. Wikipedia needs to ensure a good quality of its articles. To that end, it is critical for Wikipedia administrators to be aware of contributors' editing activity to monitor vandalism, encourage reliable contributors to work on specific articles, or find mentors for new contributors. In this paper, we present IChase, a novel interactive visualization tool to provide administrators with better awareness of editing activities on Wikipedia. Unlike the currently used visualizations that provide only page-centric information, IChase} visualizes the trend of activities for two entity types, articles and contributors. IChase} is based on two heatmaps (one for each entity type) synchronized to one timeline. It allows users to interactively explore the history of changes by drilling down into specific articles and contributors, or time points to access the details of the changes. We also present a case study to illustrate how IChase} can be used to monitor editing activities of Wikipedia authors, as well as a usability study. We conclude by discussing the strengths and weaknesses of IChase.} ""
Riedl, John Altruism, selfishness, and destructiveness on the social web 5th International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, AH 2008, July 29, 2008 - August 1, 2008 Hannover, Germany 2008 [221] Many online communities are emerging that, like Wikipedia, bring people together to build community-maintained artifacts of lasting value (CALVs).} What is the nature of people's participation in building these repositories? What are their motives? In what ways is their behavior destructive instead of constructive? Motivating people to contribute is a key problem because the quantity and quality of contributions ultimately determine a CALV's} value. We pose three related research questions: 1) How does intelligent task routing-matching people with work-affect the quantity of contributions? 2) How does reviewing contributions before accepting them affect the quality of contributions? 3) How do recommender systems affect the evolution of a shared tagging vocabulary among the contributors? We will explore these questions in the context of existing CALVs, including Wikipedia, Facebook, and MovieLens.} 2008 Springer-Verlag} Berlin Heidelberg.
Roger, Sandra; Vila, Katia; Ferrandez, Antonio; Pardino, Maria; Gomez, Jose Manuel; Puchol-Blasco, Marcel & Peral, Jesus Using AliQAn in monolingual QA@CLEF 2008 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [222] This paper describes the participation of the system AliQAn} in the CLEF} 2008 Spanish monolingual QA} task. This time, the main goals of the current version of AliQAn} were to deal with topic-related questions and to decrease the number of inexact answers. We have also explored the use of the Wikipedia corpora, which have posed some new challenges for the QA} task. 2009 Springer Berlin Heidelberg.
Roth, Benjamin & Klakow, Dietrich Combining wikipedia-based concept models for cross-language retrieval 1st Information Retrieval Facility Conference, IRFC 2010, May 31, 2010 - May 31, 2010 Vienna, Austria 2010 [223] As a low-cost ressource that is up-to-date, Wikipedia recently gains attention as a means to provide cross-language brigding for information retrieval. Contradictory to a previous study, we show that standard Latent Dirichlet Allocation (LDA) can extract cross-language information that is valuable for IR} by simply normalizing the training data. Furthermore, we show that LDA} and Explicit Semantic Analysis (ESA) complement each other, yielding significant improvements when combined. Such a combination can significantly contribute to retrieval based on machine translation, especially when query translations contain errors. The experiments were perfomed on the Multext JOC} corpus und a CLEF} dataset. ""
Ruiz-Casado, Maria; Alfonseca, Enrique & Castells, Pablo Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Third International Atlantic Web Intelligence Conference on Advances in Web Intelligence, AWIC 2005, June 6, 2005 - June 9, 2005 Lodz, Poland 2005 We describe an approach taken for automatically associating entries from an on-line encyclopedia with concepts in an ontology or a lexical semantic network. It has been tested with the Simple English Wikipedia and WordNet, although it can be used with other resources. The accuracy in disambiguating the sense of the encyclopedia entries reaches 91.11\% (83.89\% for polysemous words). It will be applied to enriching ontologies with encyclopedic knowledge. Springer-Verlag} Berlin Heidelberg 2005.
Ruiz-Casado, Maria; Alfonseca, Enrique & Castells, Pablo Automatic extraction of semantic relationships for wordNet by means of pattern learning from wikipedia 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005: Natural Language Processing and Information Systems, June 15, 2005 - June 17, 2005 Alicante, Spain 2005 This paper describes an automatic approach to identify lexical patterns which represent semantic relationships between concepts, from an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet} 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 1200 new relationships that did not appear in WordNet} originally. The precision of these relationships ranges between 0.61 and 0.69, depending on the relation. Springer-Verlag} Berlin Heidelberg 2005.
Sabin, Mihaela & Leone, Jim IT education 2.0 10th ACM Special Interest Group for Information Technology Education, SIGITE 2009, October 22, 2009 - October 24, 2009 Fairfax, VA, United states 2009 [224] Today's networked computing and communications technologies have changed how information, knowledge, and culture are produced and exchanged. People around the world join online communities that are set up voluntarily and use their members' collaborative participation to solve problems, share interests, raise awareness, or simply establish social connections. Two online community examples with significant economic and cultural impact are the open source software movement and Wikipedia. The technological infrastructure of these peer production models uses current Web 2.0 tools, such as wikis, blogs, social networking, semantic tagging, and RSS} feeds. With no control exercised by property-based markets or managerial hierarchies, commons-based peer production systems contribute to and serve the public domain and public good. The body of cultural, educational, and scientific work of many online communities is made available to the public for free and legal sharing, use, repurposing, and remixing. Higher education's receptiveness to these transformative trends deserves close examination. In the case of the Information Technology (IT) education community, in particular, we note that the curricular content, research questions, and professional skills the IT} discipline encompasses have direct linkages with the Web 2.0 phenomenon. For that reason, IT} academic programs should pioneer and lead efforts to cultivate peer production online communities. We state the case that free access and open engagement facilitated by technological infrastructures that support a peer production model benefit IT} education. We advocate that these technologies be employed to strengthen IT} educational programs, advance IT} research, and revitalize the IT} education community.
Sacarea, C.; Meza, R. & Cimpoi, M. Improving conceptual search results reorganization using term-concept mappings retrieved from wikipedia 52008 IEEE International Conference on Automation, Quality and Testing, Robotics, AQTR 2008 - THETA 16th Edition, May 22, 2008 - May 25, 2008 Cluj-Napoca, Romania 2008 [225] This paper describes a way of improving search engine results conceptual reorganization that uses formal concept analysis. This is done by using redirections to solve conceptual redundancies and by adding preliminary disambiguation and expanding the concept lattice with extra navigation nodes based on Wikipedia's ontology and strong conceptual links.
Safarkhani, Banafsheh; Mohsenzadeh, Mehran & Rahmani, Amir Masoud Improving website user model automatically using a comprehensive lexical semantic resource 2009 International Conference on E-Business and Information System Security, EBISS 2009, May 23, 2009 - May 24, 2009 Wuhan, China 2009 [226] A major component in any web personalization system is its user model. Recently a number of researches have been done to incorporate semantics of a web site in representation of its users. All of these efforts use either a specific manually constructed taxonomy or ontology or a general purpose one like WordNet} to map page views into semantic elements. However, building a hierarchy of concepts manually is time consuming and expensive. On the other hand, general purpose resources suffer from low coverage of domain specific terms. In this paper we intend to address both these shortcomings. Our contribution is that we introduce a mechanism to automatically improve the representation of the user in the website using a comprehensive lexical semantic resource. We utilize Wikipedia, the largest encyclopedia to date, as a rich lexical resource to enhance the automatic construction of vector model representation of user interests. We evaluate the effectiveness of the resulting model using concepts extracted from this promising resource. ""
Safarkhani, Banafsheh; Talabeigi, Mojde; Mohsenzadeh, Mehran & Meybodi, Mohammad Reza Deriving semantic sessions from semantic clusters 2009 International Conference on Information Management and Engineering, ICIME 2009, April 3, 2009 - April 5, 2009 Kuala Lumpur, Malaysia 2009 [227] A important phase in any web personalization system is transaction identification. Recently a number of researches have been done to incorporate semantics of a web site in representation of transactions. Building a hierarchy of concepts manually is time consuming and expensive. In this paper we intend to address these shortcomings. Our contribution is that we introduce a mechanism to automatically improve the representation of the user in the website using a comprehensive lexical semantic resource and semantic clusters. We utilize Wikipedia, the largest encyclopedia to date, as a rich lexical resource to enhance the automatic construction of vector model representation of user sessions. We cluster web pages based on their content with Hierarchical Unsupervised Fuzzy Clustering algorithms ,are effective methods, for exploring the structure of complex real data where grouping of overlapping and vague elements is necessary. Entries in web server logs are used to identify users and visit sessions, while web page or resources in the site are clustered based on their content and their semantic. Theses clusters of web documents are used to scrutinize the discovered web sessions in order to identify what we call sub-sessions. Each subsession have consistent goal. This process engendered to improving deriving semantic sessions from web site user page views. Our experiments show that proposed system significantly improves the quality of web personalization process. ""
Saito, Kazumi; Kimura, Masahiro & Motoda, Hiroshi Discovering influential nodes for SIS models in social networks 12th International Conference on Discovery Science, DS 2009, October 3, 2009 - October 5, 2009 Porto, Portugal 2009 [228] We address the problem of efficiently discovering the influential nodes in a social network under the susceptible/infected/susceptible (SIS) model, a diffusion model where nodes are allowed to be activated multiple times. The computational complexity drastically increases because of this multiple activation property. We solve this problem by constructing a layered graph from the original social network with each layer added on top as the time proceeds, and applying the bond percolation with pruning and burnout strategies. We experimentally demonstrate that the proposed method gives much better solutions than the conventional methods that are solely based on the notion of centrality for social network analysis using two large-scale real-world networks (a blog network and a wikipedia network). We further show that the computational complexity of the proposed method is much smaller than the conventional naive probabilistic simulation method by a theoretical analysis and confirm this by experimentation. The properties of the influential nodes discovered are substantially different from those identified by the centrality-based heuristic methods. 2009 Springer Berlin Heidelberg.
Sallaberry, Arnaud; Zaidi, Faraz; Pich, Christian & Melancon, Guy Interactive visualization and navigation of web search results revealing community structures and bridges 36th Graphics Interface Conference, GI 2010, May 31, 2010 - June 2, 2010 Ottawa, ON, Canada 2010 With the information overload on the Internet, organization and visualization of web search results so as to facilitate faster access to information is a necessity. The classical methods present search results as an ordered list of web pages ranked in terms of relevance to the searched topic. Users thus have to scan text snippets or navigate through various pages before finding the required information. In this paper we present an interactive visualization system for content analysis of web search results. The system combines a number of algorithms to present a novel layout methodology which helps users to analyze and navigate through a collection of web pages. We have tested this system with a number of data sets and have found it very useful for the exploration of data. Different case studies are presented based on searching different topics on Wikipedia through Exalead's search engine.
Santos, Diana & Cardoso, Nuno GikiP: Evaluating geographical answers from wikipedia 5th Workshop on Geographic Information Retrieval, GIR'08, Co-located with the ACM 17th Conference on Information and Knowledge Management, CIKM 2008, October 26, 2008 - October 30, 2008 Napa Valley, CA, United states 2008 [229] This paper describes GikiP, a pilot task that took place in 2008 in CLEF.} We present the motivation behind GikiP} and the use of Wikipedia as the evaluation collection, detail the task and we list new ideas for its continuation.
Santos, Diana; Cardoso, Nuno; Carvalho, Paula; Dornescu, Iustin; Hartrumpf, Sven; Leveling, Johannes & Skalban, Yvonne GikiP at geoCLEF 2008: Joining GIR and QA forces for querying wikipedia 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [230] This paper reports on the GikiP} pilot that took place in 2008 in GeoCLEF.} This pilot task requires a combination of methods from geographical information retrieval and question answering to answer queries to the Wikipedia. We start by the task description, providing details on topic choice and evaluation measures. Then we offer a brief motivation from several perspectives, and we present results in detail. A comparison of participants' approaches is then presented, and the paper concludes with improvements for the next edition. 2009 Springer Berlin Heidelberg.
Sarrafzadeh, Bahareh & Shamsfard, Mehrnoush Parallel annotation and population: A cross-language experience Proceedings - 2009 International Conference on Computer Engineering and Technology, ICCET 2009 445 Hoes Lane - P.O.Box} 1331, Piscataway, NJ} 08855-1331, United States 2009 [231] In recent years automatic Ontology Population (OP) from texts has emerged as a new field of application for knowledge acquisition techniques. In OP, the instances of an ontology classes will be extracted from text and added under the ontology concepts. On the other hand, semantic annotation which is a key task in moving toward semantic web tries to tag instance data in a text by their corresponding ontology classes; so the ontology population activity accompanies generating semantic annotations usually. In this paper we introduce a cross-lingual population/ annotation system called POPTA} which annotates Persian texts according to an English lexicalized ontology and populates the English ontology according to the input Persian texts. It exploits a hybrid approach, a combination of statistical and pattern-based methods as well as techniques founded on the web and search engines and a novel method of resolving translation ambiguities. POPTA} also uses Wikipedia as a vast natural language encyclopedia to extract new instances to populate the input ontology. ""
Sawaki, M.; Minami, Y.; Higashinaka, R.; Dohsaka, K. & Maeda, E. Who is this" quiz dialogue system and users' evaluation" 2008 IEEE Workshop on Spoken Language Technology, SLT 2008, December 15, 2008 - December 19, 2008 Goa, India 2008 [232] In order to design a dialogue system that users enjoy and want to be near for a long time, it is important to know the effect of the system's action on users. This paper describes Who} is this" quiz dialogue system and its users' evaluation. Its quiz-style information presentation has been found effective for educational tasks. In our ongoing effort to make it closer to a conversational partner we implemented the system as a stuffed-toy (or CG} equivalent). Quizzes are automatically generated from Wikipedia articles rather than from hand-crafted sets of biographical facts. Network mining is utilized to prepare adaptive system responses. Experiments showed the effectiveness of person network and the relationship of user attribute and interest level. """
Scardino, Giuseppe; Infantino, Ignazio & Gaglio, Salvatore Automated object shape modelling by clustering of web images 3rd International Conference on Computer Vision Theory and Applications, VISAPP 2008, January 22, 2008 - January 25, 2008 Funchal, Madeira, Portugal 2008 The paper deals with the description of a framework to create shape models of an object using images fromthe web. Results obtained from different image search engines using simple keywords are filtered, and it is possible to select images viewing a single object owning a well-defined contour. In order to have a large set of valid images, the implemented system uses lexical web databases (e.g. WordNet) or free web encyclopedias (e.g. Wikipedia), to get more keywords correlated to the given object. The shapes extracted from selected images are represented by Fourier descriptors, and are grouped by K-means algorithm. Finally, the more representative shapes of main clusters are considered as prototypical contours of the object. Preliminary experimental results are illustrated to show the effectiveness of the proposed approach.
Scarpazza, Daniele Paolo & Braudaway, Gordon W. Workload characterization and optimization of high-performance text indexing on the cell broadband enginetm (Cell/B.E.) 2009 IEEE International Symposium on Workload Characterization, IISWC 2009, October 4, 2009 - October 6, 2009 Austin, TX, United states 2009 [233] In this paper we examine text indexing on the Cell Broadband EngineTM} (Cell/B.E.), an emerging workload on an emerging multicore architecture. The Cell Broadband Engine is a microprocessor jointly developed by Sony Computer Entertainment, Toshiba, and IBM} (herein, we refer to it simply as the Cell").} The importance of text indexing is growing not only because it is the core task of commercial and enterprise-level search engines but also because it appears more and more frequently in desktop and mobile applications and on network appliances. Text indexing is a computationally intensive task. Multi-core processors promise a multiplicative increase in compute power but this power is fully available only if workloads exhibit the right amount and kind of parallelism. We present the challenges and the results of mapping text indexing tasks to the Cell processor. The Cell has become known as a platform capable of impressive performance but only when algorithms have been parallelized with attention paid to its hardware peculiarities (expensive branching wide SIMD} units small local memories). We propose a parallel software design that provides essential text indexing features at a high throughput (161 Mbyte/s per chip on Wikipedia inputs) and we present a performance analysis that details the resources absorbed by each subtask. Not only does this result affect traditional applications but it also enables new ones such as live network traffic indexing for security forensics until now believed to be too computationally demanding to be performed in real time. We conclude that at the cost of a radical algorithmic redesign our Cell-based solution delivers a 4 performance advantage over recent commodity machine like the Intel Q6600. In a per-chip comparison ours is the fastest text indexer that we are aware of. """
Scheau, Cristina; Rebedea, Traian; Chiru, Costin & Trausan-Matu, Stefan Improving the relevance of search engine results by using semantic information from Wikipedia 9th RoEduNet IEEE International Conference, RoEduNet 2010, June 24, 2010 - June 26, 2010 Sibiu, Romania 2010 Depending on the user's intention, the queries processed by a search engine can be classified in transactional, informational and navigational {[I].} In order to meet the three types of searches, at this moment search engines basically use algorithmic analysis of the links between pages improved by a factor that depends on the number of occurrences of the keywords in the query and the order of these words on each web page returned as a result. For transactional and informational queries, the relevance of the results returned by the search engine may be improved by using semantic information about the query concepts when computing the order of the results presented to the user. Wikipedia is a huge thesaurus which has the advantage of already being multi-lingual and semi-structured, presenting a dense structure of internal links that can be used to extract various types of information. This paper proposes a method to extract semantic relations between concepts considered as the names of the articles from Wikipedia, and then use these relations to determine the rank of the results returned by a search engine for a given query.
Schonberg, Christian; Pree, Helmuth & Freitag, Burkhard Rich ontology extraction and wikipedia expansion using language resources 11th International Conference on Web-Age Information Management, WAIM 2010, July 15, 2010 - July 17, 2010 Jiuzhaigou, China 2010 [234] Existing social collaboration projects contain a host of conceptual knowledge, but are often only sparsely structured and hardly machine-accessible. Using the well known Wikipedia as a showcase, we propose new and improved techniques for extracting ontology data from the wiki category structure. Applications like information extraction, data classification, or consistency checking require ontologies of very high quality and with a high number of relationships. We improve upon existing approaches by finding a host of additional relevant relationships between ontology classes, leveraging multi-lingual relations between categories and semantic relations between terms. ""
Schonhofen, Peter Identifying document topics using the wikipedia category network Web Intelligence and Agent Systems 2009 [235] In the last few years the size and coverage of Wikipedia, a community edited, freely available on-line encyclopedia has reached the point where it can be effectively used to identify topics discussed in a document, similarly to an ontology or taxonomy. In this paper we will show that even a fairly simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories surprisingly well. We test the reliability of our method by predicting categories of Wikipedia articles themselves based on their bodies, and also by performing classification and clustering on 20 Newsgroups and RCV1, representing documents by their Wikipedia categories instead of (or in addition to) their texts. 2009 - IOS} Press.
Schonhofen, Peter Annotating documents by Wikipedia concepts 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, December 9, 2008 - December 12, 2008 Sydney, NSW, Australia 2008 [236] We present a technique which is able to reliably label words or phrases of an arbitrary document with Wikipedia articles (concepts) best describing their meaning. First it scans the document content, and when it finds a word sequence matching the title of a Wikipedia article, it attaches the article to the constituent word(s). The collected articles are then scored based on three factors: (1) how many other detected articles they semantically relate to, according to the Wikipedia link structure; (2) how specific is the concept they represent; and (3) how similar is the title by which they were detected to their official" title. If a text location refers to multiple Wikipedia articles only the one with the highest score is retained. Experiments on 24000 randomly selected Wikipedia article bodies showed that 81\% of phrases annotated by article authors were correctly identified. Moreover out of the 5 concepts deemed as the most important by our algorithm during a final ranking in average 72\% was indeed marked in the original text. """
Schonhofen, Peter; Benczur, Andras; Biro, Istvan & Csalogany, Karoly Cross-language retrieval with wikipedia 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [237] We demonstrate a twofold use of Wikipedia for cross-lingual information retrieval. As our main contribution, we exploit Wikipedia hyperlinkage for query term disambiguation. We also use bilingual Wikipedia articles for dictionary extension. Our method is based on translation disambiguation; we combine the Wikipedia based technique with a method based on bigram statistics of pairs formed by translations of different source language terms. 2008 Springer-Verlag} Berlin Heidelberg.
Shahid, Ahmad R. & Kazakov, Dimitar Automatic multilingual lexicon generation using wikipedia as a resource 1st International Conference on Agents and Artificial Intelligence, ICAART 2009, January 19, 2009 - January 21, 2009 Porto, Portugal 2009 This paper proposes a method for creating a multilingual dictionary by taking the titles of Wikipedia pages in English and then finding the titles of the corresponding articles in other languages. The creation of such multilingual dictionaries has become possible as a result of exponential increase in the size of multilingual information on the web. Wikipedia is a prime example of such multilingual source of information on any conceivable topic in the world, which is edited by the readers. Here, a web crawler has been used to traverse Wikipedia following the links on a given page. The crawler takes out the title along with the titles of the corresponding pages in other targeted languages. The result is a set of words and phrases that are translations of each other. For efficiency, the URLs} are organized using hash tables. A lexicon has been constructed which contains 7-tuples corresponding to 7 different languages, namely: English, German, French, Polish, Bulgarian, Greek and Chinese.
Shilman, Michael Aggregate documents: Making sense of a patchwork of topical documents 8th ACM Symposium on Document Engineering, DocEng 2008, September 16, 2008 - September 19, 2008 Sao Paulo, Brazil 2008 [238] With the dramatic increase in quantity and diversity of online content, particularly in the form of user generated content, we now have access to unprecedented amounts of information. Whether you are researching the purchase of a new cell phone, planning a vacation, or trying to assess a political candidate, there are now countless resources at your fingertips. However, finding and making sense of all this information is laborious and it is difficult to assess high-level trends in what is said. Web sites like Wikipedia and Digg democratize the process of organizing the information from countless document into a single source where it is somewhat easier to understand what is important and interesting. In this talk, I describe a complementary set of automated alternatives to these approaches, demonstrate these approaches with a working example, the commercial web site Wize.com, and derive some basic principles for aggregating a diverse set of documents into a coherent and useful summary.
Shiozaki, Hitohiro & Eguchi, Koji Entity ranking from annotated text collections using multitype topic models 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [239] Very recently, topic model-based retrieval methods have produced good results using Latent Dirichlet Allocation (LDA) model or its variants in language modeling framework. However, for the task of retrieving annotated documents when using the LDA-based} methods, some post-processing is required outside the model in order to make use of multiple word types that are specified by the annotations. In this paper, we explore new retrieval methods using a 'multitype topic model' that can directly handle multiple word types, such as annotated entities, category labels and other words that are typically used in Wikipedia. We investigate how to effectively apply the multitype topic model to retrieve documents from an annotated collection, and show the effectiveness of our methods through experiments on entity ranking using a Wikipedia collection. 2008 Springer-Verlag} Berlin Heidelberg.
Shirakawa, Masumi; Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro Concept vector extraction from Wikipedia category network 3rd International Conference on Ubiquitous Information Management and Communication, ICUIMC'09, January 15, 2009 - January 16, 2009 Suwon, Korea, Republic of 2009 [240] The availability of machine readable taxonomy has been demonstrated by various applications such as document classification and information retrieval. One of the main topics of automated taxonomy extraction research is Web mining based statistical NLP} and a significant number of researches have been conducted. However, existing works on automatic dictionary building have accuracy problems due to the technical limitation of statistical NLP} (Natural} Language Processing) and noise data on the WWW.} To solve these problems, in this work, we focus on mining Wikipedia, a large scale Web encyclopedia. Wikipedia has high-quality and huge-scale articles and a category system because many users in the world have edited and refined these articles and category system daily. Using Wikipedia, the decrease of accuracy deriving from NLP} can be avoided. However, affiliation relations cannot be extracted by simply descending the category system automatically since the category system in Wikipedia is not in a tree structure but a network structure. We propose concept vectorization methods which are applicable to the category network structured in Wikipedia. ""
Siira, Erkki; Tuikka, Tuomo & Tormanen, Vili Location-based mobile wiki using NFC tag infrastructure 2009 1st International Workshop on Near Field Communication, NFC 2009, February 24, 2009 - February 24, 2009 Hagenberg, Austria 2009 [241] Wikipedia is widely known encyclopedia in the web updated by volunteers around the world. Mobile and locationbased wiki with NFC, however, brings forward the idea of using Near Field Communication tags as an enabler for seeking information content from wiki. In this paper we shortly address how NFC} infrastructure can be created in a city for the use of location-based wiki. The users of the system can read local information from the Wikipedia system and also update the location-based content. We present an implementation of such a system. Finally, we evaluate the restrictions of the technological system, and delineate further work. ""
Silva, Lalindra De & Jayaratne, Lakshman Semi-automatic extraction and modeling of ontologies using wikipedia XML corpus 2nd International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2009, August 4, 2009 - August 6, 2009 London, United kingdom 2009 [242] This paper introduces WikiOnto:} a system that assists in the extraction and modeling of topic ontologies in a semi-automatic manner using a preprocessed document corpus derived from Wikipedia. Based on the Wikipedia XML} Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies. Using Natural Language Processing (NLP) and other Machine Learning (ML) techniques along with a very rich document corpus, this system proposes a solution to a task that is generally considered extremely cumbersome. The initial results of the prototype suggest strong potential of the system to become highly successful in ontology extraction and modeling and also inspire further research on extracting ontologies from other semi-structured document corpora as well. ""
Silva, Lalindra De & Jayaratne, Lakshman WikiOnto: A system for semi-automatic extraction and modeling of ontologies using Wikipedia XML corpus ICSC 2009 - 2009 IEEE International Conference on Semantic Computing, September 14, 2009 - September 16, 2009 Berkeley, CA, United states 2009 [243] This paper introduces WikiOnto:} a system that assists in the extraction and modeling of topic ontologies in a semi-automatic manner using a preprocessed document corpus of one of the largest knowledge bases in the world - the Wikipedia. Based on the Wikipedia XML} Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies. Using Natural Language Processing (NLP) and other Machine Learning (ML) techniques along with a very rich document corpus, this system proposes a solution to a task that is generally considered extremely cumbersome. The initial results of the prototype suggest strong potential of the system to become highly successful in ontology extraction and modeling and also inspire further research on extracting ontologies from other semi-structured document corpora as well. ""
Sipo, Ruben; Bhole, Abhijit; Fortuna, Blaz; Grobelnik, Marko & Mladenic, Dunja Demo: Historyviz - Visualizing events and relations extracted from wikipedia 6th European Semantic Web Conference, ESWC 2009, May 31, 2009 - June 4, 2009 Heraklion, Crete, Greece 2009 [244] {HistoryViz} provides a new perspective on a certain kind of textual data, in particular the data available in the Wikipedia, where different entities are described and put in historical perspective. Instead of browsing through pages each describing a certain topic, we can look at the relations between entities and events connected with the selected entities. The presented solution implemented in {HistoryViz} provides user with a graphical interface allowing viewing events concerning the selected person on a timeline and viewing relations to other entities as a graph that can be dynamically expanded. 2009 Springer Berlin Heidelberg.
Sjobergh, Jonas; Sjobergh, Olof & Araki, Kenji What types of translations hide in Wikipedia? 3rd International Conference on Large-Scale Knowledge Resources, LKR 2008, March 3, 2008 - March 5, 2008 Tokyo, Japan 2008 [245] We extend an automatically generated bilingual Japanese-Swedish} dictionary with new translations, automatically discovered from the multi-lingual online encyclopedia Wikipedia. Over 50,000 translations, most of which are not present in the original dictionary, are generated, with very high translation quality. We analyze what types of translations can be generated by this simple method. The majority of the words are proper nouns, and other types of (usually) uninteresting translations are also generated. Not counting the less interesting words, about 15,000 new translations are still found. Checking against logs of search queries from the old dictionary shows that the new translations would significantly reduce the number of searches with no matching translation. 2008 Springer-Verlag} Berlin Heidelberg.
Slattery, Shaun Edit this page": The socio-technological infrastructure of a wikipedia article" 27th ACM International Conference on Design of Communication, SIGDOC'09, October 5, 2009 - October 7, 2009 Bloomington, IN, United states 2009 [246] Networked environments, such as wikis, are commonly used to support work, including the collaborative authoring of information and fact-building. " In networked environments the activity of fact-building is mediated not only by the technological features of the interface but also by the social conventions of the community it supports. This paper examines the social and technological features of a Wikipedia article in order to understand how these features help mediate the activity of factbuilding and highlights the need for communication designers to consider the goals and needs of the communities for which they design. """
Sluis, Frans Van Der & Broek, Egon L. Van Den Using complexity measures in Information Retrieval 3rd Information Interaction in Context Symposium, IIiX'10, August 18, 2010 - August 21, 2010 New Brunswick, NJ, United states 2010 [247] Although Information Retrieval (IR) is meant to serve its users, surprisingly little IR} research is not user-centered. In contrast, this article utilizes the concept complexity of in- formation as the determinant of the user's comprehension, not as a formal golden measure. Four aspects of user's com- prehension are applies on a database of simple and normal Wikipedia articles and found to distinguish between them. The results underline the feasibility of the principle of par- simony for IR:} where two topical articles are available, the simpler one is preferred ""
Smirnov, Alexander V. & Krizhanovsky, Andrew A. Information filtering based on wiki index database Computational Intelligence in Decision and Control - 8th International FLINS Conference, September 21, 2008 - September 24, 2008 Madrid, Spain 2008 In this paper we present a profile-based approach to information filtering by an analysis of the content of text documents. The Wikipedia index database is created and used to automatically generate the user profile from the user's document collection. The problem-oriented Wikipedia subcorpora are created (using knowledge extracted from the user profile) for each topic of user interests. The index databases of these subcorpora are applied to filtering information flow (e.g., mails, news). Thus, the analyzed texts are classified into several topics explicitly presented in the user profile. The paper concentrates on the indexing part of the approach. The architecture of an application implementing the Wikipedia indexing is described. The indexing method is evaluated using the Russian and Simple English Wikipedia.
Sood, Sara Owsley & Vasserman, Lucy ESSE: Exploring mood on the web 2009 ICWSM Workshop, May 20, 2009 - May 20, 2009 San Jose, CA, United states 2009 Future machines will connect with users on an emotional level in addition to performing complex computations (Norman} 2004). In this article, we present a system that adds an emotional dimension to an activity that Internet users engage in frequently, search. ESSE, which stands for Emotional State Search Engine, is a web search engine that goes beyond facilitating a user's exploration of the web by topic, as search engines such as Google or Yahoo! afford. Rather, it enables the user to browse their topically relevant search results by mood, providing the user with a unique perspective on the topic at hand. Consider a user wishing to read opinions about the new president of the United States. Typing President} Obama" into a Google search box will return (among other results) a few recent news stories about Obama the Whitehouse's website as well as a wikipedia article about him. Typing {"President} Obama" into a Google Blog Search box will bring the user a bit closer to their goal in that all of the results are indeed blogs (typically opinions) about Obama. However where blog search engines fall short is in providing users with a way to navigate and digest the vastness of the blogosphere the incredible number of results for the query {"President} Obama" (approximately 17335307 as of 2/24/09) (Google} Blog Search 2009). ESSE} provides another dimension by which users can take in the vastness of the web or the blogosphere. This article outlines the contributions of ESSE} including a new approach to mood classification. Copyright 2009 Association for the Advancement of Artificial Intelligence (www.aaai.org)."
Suh, Bongwon; Chi, Ed H.; Kittur, Aniket & Pendleton, Bryan A. Lifting the veil: Improving accountability and social transparency in Wikipedia with WikiDashboard 26th Annual CHI Conference on Human Factors in Computing Systems, CHI 2008, April 5, 2008 - April 10, 2008 Florence, Italy 2008 [248] Wikis are collaborative systems in which virtually anyone can edit anything. Although wikis have become highly popular in many domains, their mutable nature often leads them to be distrusted as a reliable source of information. Here we describe a social dynamic analysis tool called WikiDashboard} which aims to improve social transparency and accountability on Wikipedia articles. Early reactions from users suggest that the increased transparency afforded by the tool can improve the interpretation, communication, and trustworthiness of Wikipedia articles. ""
Suh, Bongwon; Chi, Ed H.; Pendleton, Bryan A. & Kittur, Aniket Us vs. Them: Understanding social dynamics in wikipedia with revert graph visualizations VAST IEEE Symposium on Visual Analytics Science and Technology 2007, October 30, 2007 - November 1, 2007 Sacramento, CA, United states 2007 [249] Wikipedia is a wiki-based encyclopedia that has become one of the most popular collaborative on-line knowledge systems. As in any large collaborative system, as Wikipedia has grown, conflicts and coordination costs have increased dramatically. Visual analytic tools provide a mechanism for addressing these issues by enabling users to more quickly and effectively make sense of the status of a collaborative environment. In this paper we describe a model for identifying patterns of conflicts in Wikipedia articles. The model relies on users' editing history and the relationships between user edits, especially revisions that void previous edits, known as reverts". Based on this model we constructed Revert Graph a tool that visualizes the overall conflict patterns between groups of users. It enables visual analysis of opinion groups and rapid interactive exploration of those relationships via detail drill-downs. We present user patterns and case studies that show the effectiveness of these techniques and discuss how they could generalize to other systems. """
Swarts, Jason The collaborative construction of 'fact' on wikipedia 27th ACM International Conference on Design of Communication, SIGDOC'09, October 5, 2009 - October 7, 2009 Bloomington, IN, United states 2009 [250] For years Wikipedia has come to symbolize the potential of Web 2.0 for harnessing the power of mass collaboration and collective intelligence. As wikis continue to develop and move into streams of cultural, social, academic, and enterprise work activity, it is appropriate to consider how collective intelligence emerges from mass collaboration. Collective intelligence can take many forms - this paper examines one, the emergence of stable facts on Wikipedia. More specifically, this paper examines ways of participating that lead to the creation of facts. This research will show how we can be more effective consumers, producers, and managers of wiki information by understanding how collaboration shapes facts. ""
Szomszor, Martin; Alani, Harith; Cantador, Ivan; O'Hara, Kieron & Shadbolt, Nigel Semantic modelling of user interests based on cross-folksonomy analysis 7th International Semantic Web Conference, ISWC 2008, October 26, 2008 - October 30, 2008 Karlsruhe, Germany 2008 [251] The continued increase in Web usage, in particular participation in folksonomies, reveals a trend towards a more dynamic and interactive Web where individuals can organise and share resources. Tagging has emerged as the de-facto standard for the organisation of such resources, providing a versatile and reactive knowledge management mechanism that users find easy to use and understand. It is common nowadays for users to have multiple profiles in various folksonomies, thus distributing their tagging activities. In this paper, we present a method for the automatic consolidation of user profiles across two popular social networking sites, and subsequent semantic modelling of their interests utilising Wikipedia as a multi-domain model. We evaluate how much can be learned from such sites, and in which domains the knowledge acquired is focussed. Results show that far richer interest profiles can be generated for users when multiple tag-clouds are combined. 2008 Springer Berlin Heidelberg.
Szymanski, Julian Mining relations between wikipedia categories 2nd International Conference on 'Networked Digital Technologies', NDT 2010, July 7, 2010 - July 9, 2010 Prague, Czech republic 2010 [252] The paper concerns the problem of automatic category system creation for a set of documents connected with references. Presented approach has been evaluated on the Polish Wikipedia, where two graphs: the Wikipedia category graph and article graph has been analyzed. The linkages between Wikipedia articles has been used to create a new category graph with weighted edges. We compare the created category graph with the original Wikipedia category graph, testing its quality in terms of coverage. 2010 Springer-Verlag} Berlin Heidelberg.
Szymanski, Julian WordVenture - Cooperative WordNet editor: Architecture for lexical semantic acquisition 1st International Conference on Knowledge Engineering and Ontology Development, KEOD 2009, October 6, 2009 - October 8, 2009 Funchal, Madeira, Portugal 2009 This article presents architecture for acquiring lexical semantics in a collaborative approach paradigm. The system enables functionality for editing semantic networks in a wikipedia-like style. The core of the system is a user-friendly interface based on interactive graph navigation. It has been used for semantic network presentation, and brings simultaneously modification functionality.
Tan, Saravadee Sae; Kong, Tang Enya & Sodhy, Gian Chand Annotating wikipedia articles with semantic tags for structured retrieval 2nd ACM Workshop on Social Web Search and Mining, SWSM'09, Co-located with the 18th ACM International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China 2009 [253] Structured retrieval aims at exploiting the structural information of documents when searching for documents. Structured retrieval makes use of both content and structure of documents to improve information retrieval. Therefore, the availability of semantic structure in the documents is an important factor for the success of structured retrieval. However, the majority of documents in the Web still lack semantically-rich structure. This motivates us to automatically identify the semantic information in web documents and explicitly annotate the information with semantic tags. Based on the well-known Wikipedia corpus, this paper describes an unsupervised learning approach to identify conceptual information and descriptive information of an entity described in a Wikipedia article. Our approach utilizes Wikipedia link structure and Infobox information in order to learn the semantic structure of the Wikipedia articles. We also describe a lazy approach used in the learning process. By utilizing the Wikipedia categories provided by the contributors, only a subset of entities in a Wikipedia category is used as training data in the learning process and the results can be applied to the rest of the entities in the category. ""
Taneva, Bilyana; Kacimi, Mouna & Weikum, Gerhard Gathering and ranking photos of named entities with high precision, high recall, and diversity 3rd ACM International Conference on Web Search and Data Mining, WSDM 2010, February 3, 2010 - February 6, 2010 New York City, NY, United states 2010 [254] Knowledge-sharing communities like Wikipedia and automated extraction methods like those of DBpedia} enable the construction of large machine-processible knowledge bases with relational facts about entities. These endeavors lack multimodal data like photos and videos of people and places. While photos of famous entities are abundant on the Internet, they are much harder to retrieve for less popular entities such as notable computer scientists or regionally interesting churches. Querying the entity names in image search engines yields large candidate lists, but they often have low precision and unsatisfactory recall. Our goal is to populate a knowledge base with photos of named entities, with high precision, high recall, and diversity of photos for a given entity. We harness relational facts about entities for generating expanded queries to retrieve different candidate lists from image search engines. We use a weighted voting method to determine better rankings of an entity's photos. Appropriate weights are dependent on the type of entity (e.g., scientist vs. politician) and automatically computed from a small set of training entities. We also exploit visual similarity measures based on SIFT} features, for higher diversity in the final rankings. Our experiments with photos of persons and landmarks show significant improvements of ranking measures like MAP} and NDCG, and also for diversity-aware ranking. ""
Tellez, Alberto; Juarez, Antonio; Hernandez, Gustavo; Denicia, Claudia; Villatoro, Esau; Montes, Manuel & Villasenor, Luis A lexical approach for Spanish question answering 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [255] This paper discusses our system's results at the Spanish Question Answering task of CLEF} 2007. Our system is centered in a full data-driven approach that combines information retrieval and machine learning techniques. It mainly relies on the use of lexical information and avoids any complex language processing procedure. Evaluation results indicate that this approach is very effective for answering definition questions from Wikipedia. In contrast, they also reveal that it is very difficult to respond factoid questions from this resource solely based on the use of lexical overlaps and redundancy. 2008 Springer-Verlag} Berlin Heidelberg.
Theng, Yin-Leng; Li, Yuanyuan; Lim, Ee-Peng; Wang, Zhe; Goh, Dion Hoe-Lian; Chang, Chew-Hung; Chatterjea, Kalyani & Zhang, Jun Understanding user perceptions on usefulness and usability of an integrated Wiki-G-Portal 9th International Conference on Asian Digital Libraries, ICADL 2006, November 27, 2006 - November 30, 2006 Kyoto, Japan 2006 This paper describes a pilot study on Wiki-G-Portal, a project integrating Wikipedia, an online encyclopedia, into G-Portal, a Web-based digital library, of geography resources. Initial findings from the pilot study seemed to suggest positive perceptions on usefulness and usability of Wiki-G-Portal, as well as subjects' attitude and intention to use. Springer-Verlag} Berlin Heidelberg 2006.
Thomas, Christopher; Mehra, Pankaj; Brooks, Roger & Sheth, Amit Growing fields of interest using an expand and reduce strategy for domain model extraction 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, December 9, 2008 - December 12, 2008 Sydney, NSW, Australia 2008 [256] Domain hierarchies are widely used as models underlying information retrieval tasks. Formal ontologies and taxonomies enrich such hierarchies further with properties and relationships but require manual effort; therefore they are costly to maintain, and often stale. Folksonomies and vocabularies lack rich category structure. Classification and extraction require the coverage of vocabularies and the alterability of folksonomies and can largely benefit from category relationships and other properties. With Doozer, a program for building conceptual models of information domains, we want to bridge the gap between the vocabularies and Folksonomies on the one side and the rich, expert-designed ontologies and taxonomies on the other. Doozer mines Wikipedia to produce tight domain hierarchies, starting with simple domain descriptions. It also adds relevancy scores for use in automated classification of information. The output model is described as a hierarchy of domain terms that can be used immediately for classifiers and IR} systems or as a basis for manual or semi-automatic creation of formal ontologies. ""
Tianyi, Shi; Shidou, Jiao; Junqi, Hou & Minglu, Li Improving keyphrase extraction using wikipedia semantics 2008 2nd International Symposium on Intelligent Information Technology Application, IITA 2008, December 21, 2008 - December 22, 2008 Shanghai, China 2008 [257] Keyphrase extraction plays a key role in various fields such as information retrieval, text classification etc. However, most traditional keyphrase extraction methods relies on word frequency and position instead of document inherent semantic information, often results in inaccurate output. In this paper, we propose a novel automatic keyphrase extraction algorithm using semantic features mined from online Wikipedia. This algorithm first identifies candidate keyphrases based on lexical methods, and then a semantic graph which connects candidate keyphrases with document topics is constructed. Afterwards, a link analysis algorithm is applied to assign semantic feature weight to the candidate keyphrases. Finally, several statistical and semantic features are assembled by a regression model to predict the quality of candidates. Encouraging results are achieved in our experiments which show the effectiveness of our method. ""
Tran, Tien; Kutty, Sangeetha & Nayak, Richi Utilizing the structure and content information for XML document clustering 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [258] This paper reports on the experiments and results of a clustering approach used in the INEX} 2008 document mining challenge. The clustering approach utilizes both the structure and content information of the Wikipedia XML} document collection. A latent semantic kernel (LSK) is used to measure the semantic similarity between XML} documents based on their content features. The construction of a latent semantic kernel involves the computing of singular vector decomposition (SVD).} On a large feature space matrix, the computation of SVD} is very expensive in terms of time and memory requirements. Thus in this clustering approach, the dimension of the document space of a term-document matrix is reduced before performing SVD.} The document space reduction is based on the common structural information of the Wikipedia XML} document collection. The proposed clustering approach has shown to be effective on the Wikipedia collection in the INEX} 2008 document mining challenge. 2009 Springer Berlin Heidelberg.
Tran, Tien; Nayak, Richi & Bruza, Peter Document clustering using incremental and pairwise approaches 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [259] This paper presents the experiments and results of a clustering approach for clustering of the large Wikipedia dataset in the INEX} 2007 Document Mining Challenge. The clustering approach employed makes use of an incremental clustering method and a pairwise clustering method. The approach enables us to perform the clustering task on a large dataset by first reducing the dimension of the dataset to an undefined number of clusters using the incremental method. The lower-dimension dataset is then clustered to a required number of clusters using the pairwise method. In this way, clustering of the large number of documents is performed successfully and the accuracy of the clustering solution is achieved. 2008 Springer-Verlag} Berlin Heidelberg.
Tsikrika, Theodora & Kludas, Jana Overview of the WikipediaMM task at ImageCLEF 2008 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [260] The WikipediaMM} task provides a testbed for the system- oriented evaluation of ad-hoc retrieval from a large collection of Wikipedia images. It became a part of the ImageCLEF} evaluation campaign in 2008 with the aim of investigating the use of visual and textual sources in combination for improving the retrieval performance. This paper presents an overview of the task's resources, topics, assessments, participants' approaches, and main results. 2009 Springer Berlin Heidelberg.
Tsikrika, Theodora; Serdyukov, Pavel; Rode, Henning; Westerveld, Thijs; Aly, Robin; Hiemstra, Djoerd & Vries, Arjen P. De Structured document retrieval, multimedia retrieval, and entity ranking using PF/Tijah 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [261] CWI} and University of Twente used PF/Tijah, a flexible XML} retrieval system, to evaluate structured document retrieval, multimedia retrieval, and entity ranking tasks in the context of INEX} 2007. For the retrieval of textual and multimedia elements in the Wikipedia data, we investigated various length priors and found that biasing towards longer elements than the ones retrieved by our language modelling approach can be useful. For retrieving images in isolation, we found that their associated text is a very good source of evidence in the Wikipedia collection. For the entity ranking task, we used random walks to model multi-step relevance propagation from the articles describing entities to all related entities and further, and obtained promising results. 2008 Springer-Verlag} Berlin Heidelberg.
Urdaneta, Guido; Pierre, Guillaume & Steen, Maarten Van A decentralized wiki engine for collaborative wikipedia hosting 3rd International Conference on Web Information Systems and Technologies, Webist 2007, March 3, 2007 - March 6, 2007 Barcelona, Spain 2007 This paper presents the design of a decentralized system for hosting large-scale wiki web sites like Wikipedia, using a collaborative approach. Our design focuses on distributing the pages that compose the wiki across a network of nodes provided by individuals and organizations willing to collaborate in hosting the wiki. We present algorithms for placing the pages so that the capacity of the nodes is not exceeded and the load is balanced, and algorithms for routing client requests to the appropriate nodes. We also address fault tolerance and security issues.
Vaishnavi, Vijay K.; Vandenberg, Art; Zhang, Yanqing & Duraisamy, Saravanaraj Towards design principles for effective context-and perspective-based web mining 4th International Conference on Design Science Research in Information Systems and Technology, DESRIST '09, May 7, 2009 - May 8, 2009 Philadelphia, CA, United states 2009 [262] A practical and scalable web mining solution is needed that can assist the user in processing existing web-based resources to discover specific, relevant information content. This is especially important for researcher communities where data deployed on the World Wide Web are characterized by autonomous, dynamically evolving, and conceptually diverse information sources. The paper describes a systematic design research study that is based on prototyping/evaluation and abstraction using existing and new techniques incorporated as plug and play components into a research workbench. The study investigates an approach, DISCOVERY, for using (1) context/perspective information and (2) social networks such as ODP} or Wikipedia for designing practical and scalable human-web systems for finding web pages that are relevant and meet the needs and requirements of a user or a group of users. The paper also describes the current implementation of DISCOVERY} and its initial use in finding web pages in a targeted web domain. The resulting system arguably meets the common needs and requirements of a group of people based on the information provided by the group in the form of a set of context web pages. The system is evaluated for a scenario in which assistance of the system is sought for a group of faculty members in finding NSF} research grant opportunities that they should collaboratively respond to, utilizing the context provided by their recent publications. ""
Vercoustre, Anne-Marie; Pehcevski, Jovan & Naumovski, Vladimir Topic difficulty prediction in entity ranking 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [263] Entity ranking has recently emerged as a research field that aims at retrieving entities as answers to a query. Unlike entity extraction where the goal is to tag the names of the entities in documents, entity ranking is primarily focused on returning a ranked list of relevant entity names for the query. Many approaches to entity ranking have been proposed, and most of them were evaluated on the INEX} Wikipedia test collection. In this paper, we show that the knowledge of predicted classes of topic difficulty can be used to further improve the entity ranking performance. To predict the topic difficulty, we generate a classifier that uses features extracted from an INEX} topic definition to classify the topic into an experimentally pre-determined class. This knowledge is then utilised to dynamically set the optimal values for the retrieval parameters of our entity ranking system. Our experiments suggest that topic difficulty prediction is a promising approach that could be exploited to improve the effectiveness of entity ranking. 2009 Springer Berlin Heidelberg.
Vercoustre, Anne-Marie; Pehcevski, Jovan & Thom, James A. Using Wikipedia categories and links in entity ranking 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [264] This paper describes the participation of the INRIA} group in the INEX} 2007 XML} entity ranking and ad hoc tracks. We developed a system for ranking Wikipedia entities in answer to a query. Our approach utilises the known categories, the link structure of Wikipedia, as well as the link co-occurrences with the examples (when provided) to improve the effectiveness of entity ranking. Our experiments on both the training and the testing data sets demonstrate that the use of categories and the link structure of Wikipedia can significantly improve entity retrieval effectiveness. We also use our system for the ad hoc tasks by inferring target categories from the title of the query. The results were worse than when using a full-text search engine, which confirms our hypothesis that ad hoc retrieval and entity retrieval are two different tasks. 2008 Springer-Verlag} Berlin Heidelberg.
Vercoustre, Anne-Marie; Thom, James A. & Pehcevski, Jovan Entity ranking in Wikipedia 23rd Annual ACM Symposium on Applied Computing, SAC'08, March 16, 2008 - March 20, 2008 Fortaleza, Ceara, Brazil 2008 [265] The traditional entity extraction problem, lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates. There are many research activities involving named entities; we are interested in entity ranking in the field of information retrieval. In this paper, we describe our approach to identifying and ranking entities from the INEX} Wikipedia document collection. Wikipedia offers a number of interesting features for entity identification and ranking that we first introduce. We then describe the principles and the architecture of our entity ranking system, and introduce our methodology for evaluation. Our preliminary results show that the use of categories and the link structure of Wikipedia, together with entity examples, can significantly improve retrieval effectiveness. ""
Viegas, Fernanda B.; Wattenberg, Martin & Mckeon, Matthew M. The hidden order of wikipedia 2nd International Conference on Online Communities and Social Computing, OCSC 2007, July 22, 2007 - July 27, 2007 Beijing, China 2007 We examine the procedural side of Wikipedia, the well-known internet encyclopedia. Despite the lack of structure in the underlying wiki technology, users abide by hundreds of rules and follow well-defined processes. Our case study is the Featured Article (FA) process, one of the best established procedures on the site. We analyze the FA} process through the theoretical framework of commons governance, and demonstrate how this process blends elements of traditional workflow with peer production. We conclude that rather than encouraging anarchy, many aspects of wiki technology lend themselves to the collective creation of formalized process and policy. Springer-Verlag} Berlin Heidelberg 2007.
Villarreal, Sara Elena Gaza; Elizalde, Lorena Martinez & Viveros, Adriana Canseco Clustering hyperlinks for topic extraction: An exploratory analysis 8th Mexican International Conference on Artificial Intelligence, MICAI 2009, November 9, 2009 - November 13, 2009 Guanajuato, Guanajuato, Mexico 2009 [266] In a Web of increasing size and complexity, a key issue is automatic document organization, which includes topic extraction in collections. Since we consider topics as document clusters with semantic properties, we are concerned with exploring suitable clustering techniques for their identification on hyperlinked environments (where we only regard structural information). For this purpose, three algorithms (PDDP, kmeans, and graph local clustering) were executed over a document subset of an increasingly popular corpus: Wikipedia. Results were evaluated with unsupervised metrics (cosine similarity, semantic relatedness, Jaccard index) and suggest that promising results can be produced for this particular domain. ""
Vries, Arjen P. De; Vercoustre, Anne-Marie; Thom, James A.; Craswell, Nick & Lalmas, Mounia Overview of the INEX 2007 entity ranking track 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [267] Many realistic user tasks involve the retrieval of specific entities instead of just any type of documents. Examples of information needs include {'Countries} where one can pay with the euro' or {'Impressionist} art museums in The Netherlands'. The Initiative for Evaluation of XML} Retrieval (INEX) started the XML} Entity Ranking track (INEX-XER) to create a test collection for entity retrieval in Wikipedia. Entities are assumed to correspond to Wikipedia entries. The goal of the track is to evaluate how well systems can rank entities in response to a query; the set of entities to be ranked is assumed to be loosely defined either by a generic category (entity ranking) or by some example entities (list completion). This track overview introduces the track setup, and discusses the implications of the new relevance notion for entity ranking in comparison to ad hoc retrieval. 2008 Springer-Verlag} Berlin Heidelberg.
Vries, Christopher M. De; Geva, Shlomo & Vine, Lance De Clustering with random indexing K-tree and XML structure 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [268] This paper describes the approach taken to the clustering task at INEX} 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX} 2009 Wikipedia collection. The RI} K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies. 2010 Springer-Verlag} Berlin Heidelberg.
Vroom, Regine W.; Vossen, Lysanne E. & Geers, Anoek M. Aspects to motivate users of a design engineering wiki to share their knowledge Proceedings of World Academy of Science, Engineering and Technology 2009 Industrial design engineering is an information and knowledge intensive job. Although Wikipedia offers a lot of this information, design engineers are better served with a wiki tailored to their job, offering information in a compact manner and functioning as a design tool. For that reason WikID} has been developed. However for the viability of a wiki, an active user community is essential. The main subject of this paper is a study to the influence of the communication and the contents of WikID} on the user's willingness to contribute. At first the theory about a website's first impression, general usability guidelines and user motivation in an online community is studied. Using this theory, the aspects of the current site are analyzed on their suitability. These results have been verified with a questionnaire amongst 66 industrial design engineers (or students industrial design engineering). The main conclusion is that design engineers are enchanted with the existence of WikID} and its knowledge structure (taxonomy) but this structure has not become clear without any guidance. In other words, the knowledge structure is very helpful for inspiring and guiding design engineers through their tailored knowledge domain in WikID} but this taxonomy has to be better communicated on the main page. Thereby the main page needs to be fitted more to the target group preferences.
Waltinger, Ulli & Mehler, Alexander Who is it? Context sensitive named entity and instance recognition by means of Wikipedia 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, December 9, 2008 - December 12, 2008 Sydney, NSW, Australia 2008 [269] This paper presents an approach for predicting context sensitive entities exemplified in the domain of person names. Our approach is based on building a weighted context but also a weighted people graph and predicting the context entity by extracting the best fitting sub graph using a spreading activation technique. The results of the experiments show a quite promising F-Measure} of 0.99. ""
Waltinger, Ulli; Mehler, Alexander & Heyer, Gerhard Towards automatic content tagging - Enhanced web services in digital libraries using lexical chaining WEBIST 2008 - 4th International Conference on Web Information Systems and Technologies, May 4, 2008 - May 7, 2008 Funchal, Madeira, Portugal 2008 This paper proposes a web-based application which combines social tagging, enhanced visual representation of a document and the alignment to an open-ended social ontology. More precisely we introduce on the one hand an approach for automatic extraction of document related keywords for indexing and representing document content as an alternative to social tagging. On the other hand a proposal for automatic classification within a social ontology based on the German Wikipedia category taxonomy is proposed. This paper has two main goals: to describe the method of automatic tagging of digital documents and to provide an overview of the algorithmic patterns of lexical chaining that can be applied for topic tracking and -labelling of digital documents.
Wang, Gang; Yu, Yong & Zhu, Haiping PORE: Positive-only relation extraction from wikipedia text 6th International Semantic Web Conference, ISWC 2007 and 2nd Asian Semantic Web Conference, ASWC 2007, November 11, 2007 - November 15, 2007 Busan, Korea, Republic of 2007 [270] Extracting semantic relations is of great importance for the creation of the Semantic Web content. It is of great benefit to semi-automatically extract relations from the free text of Wikipedia using the structured content readily available in it. Pattern matching methods that employ information redundancy cannot work well since there is not much redundancy information in Wikipedia, compared to the Web. Multi-class classification methods are not reasonable since no classification of relation types is available in Wikipedia. In this paper, we propose PORE} (Positive-Only} Relation Extraction), for relation extraction from Wikipedia text. The core algorithm B-POL} extends a state-of-the-art positive-only learning algorithm using bootstrapping, strong negative identifi cation, and transductive inference to work with fewer positive training exam ples. We conducted experiments on several relations with different amount of training data. The experimental results show that B-POL} can work effectively given only a small amount of positive training examples and it significantly out per forms the original positive learning approaches and a multi-class SVM.} Furthermore, although PORE} is applied in the context of Wiki pedia, the core algorithm B-POL} is a general approach for Ontology Population and can be adapted to other domains. 2008 Springer-Verlag} Berlin Heidelberg.
Wang, Jun; Jin, Xin & Wu, Yun-Peng An empirical study of knowledge collaboration networks in virtual community: Based on wiki 2009 16th International Conference on Management Science and Engineering, ICMSE 2009, September 14, 2009 - September 16, 2009 Moscow, Russia 2009 [271] Wikipedia is a typical Knowledge collaboration-oriented virtual community. Yet its collaboration mechanism remains unclear. This empirical study explores wikipedia's archive data and proposes a knowledge collaboration network model. The analysis indicates that wiki-based knowledge collaboration network is a type of BA} scale-free network which obeys power-law distribution. On the other hand, this network is characterized with higher stable clustering coefficient and smaller average distance, thus present obvious small-world effect. Moreover, the network topology is non-hierarchical becuase clustering coefficients and degrees don't conform to power -law distribution. The above results profile the collaboration network and figure the key network property. Thus we can use the model to describe how people interact with each other and to what extend they collaborate on content creation. ""
Wang, Juncheng; Ma, Feicheng & Cheng, Jun The impact of research design on the half-life of the wikipedia category system 2010 International Conference on Computer Design and Applications, ICCDA 2010, June 25, 2010 - June 27, 2010 Qinhuangdao, Hebei, China 2010 [272] The Wikipedia category system has shown a phenomenon of life or obsolescence similar as periodical literatures do, so this paper aims to investigate how the factors related to study design and research process, involving the observation points and the time span, play an impact on the obsolescence of the Wikipedia category system. For the impact of different observation points, we make use of the datasets at different time points under the same time span and the results show that the observation points do have an obvious influence on the category cited half-life; And for the impact of time span, we use the datasets with different intervals at the same time point and the results indicate that the time span has a certain impact on the categories' obsolescence. Based on the deep analysis, the paper further proposes some useful suggestions for the similar studies on information obsolescence in the future. ""
Wang, Li; Yata, Susumu; Atlam, El-Sayed; Fuketa, Masao; Morita, Kazuhiro; Bando, Hiroaki & Aoe, Jun-Ichi A method of building Chinese field association knowledge from Wikipedia 2009 International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2009, September 24, 2009 - September 27, 2009 Dalian, China 2009 [273] Field Association (FA) terms form a limited set of discriminating terms that give us the knowledge to identify document fields. The primary goal of this research is to make a system that can imitate the process whereby humans recognize the fields by looking at a few Chinese FA} terms in a document. This paper proposes a new approach to build a Chinese FA} terms dictionary automatically from Wikipedia. 104,532 FA} terms are added in the dictionary. The resulting FA} terms by using this dictionary are applied to recognize the fields of 5,841 documents. The average accuracy in the experiment is 92.04\%. The results show that the presented method is effective in building FA} terms from Wikipedia automatically. ""
Wang, Qiuyue; Li, Qiushi; Wang, Shan & Du, Xiaoyong Exploiting semantic tags in XML retrieval 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [274] With the new semantically annotated Wikipedia XML} corpus, we attempt to investigate the following two research questions. Do the structural constraints in CAS} queries help in retrieving an XML} document collection containing semantically rich tags? How to exploit the semantic tag information to improve the CO} queries as most users prefer to express the simplest forms of queries? In this paper, we describe and analyze the work done on comparing CO} and CAS} queries over the document collection at INEX} 2009 ad hoc track, and we propose a method to improve the effectiveness of CO} queries by enriching the element content representations with semantic tags. Our results show that the approaches of enriching XML} element representations with semantic tags are effective in improving the early precision, while on average precisions, strict interpretation of CAS} queries are generally superior. 2010 Springer-Verlag} Berlin Heidelberg.
Wang, Yang; Wang, Haofen; Zhu, Haiping & Yu, Yong Exploit semantic information for category annotation recommendation in Wikipedia 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, June 27, 2007 - June 29, 2007 Paris, France 2007 Compared with plain-text resources, the ones in semi-semantic" web sites such as Wikipedia contain high-level semantic information which will benefit various automatically annotating tasks on themself. In this paper we propose a "collaborative annotating" approach to automatically recommend categories for a Wikipedia article by reusing category annotations from its most similar articles and ranking these annotations by their confidence. In this approach four typical semantic features in Wikipedia namely incoming link outgoing link section heading and template item are investigated and exploited as the representation of articles to feed the similarity calculation. The experiment results have not only proven that these semantic features improve the performance of category annotating with comparison to the plain text feature; but also demonstrated the strength of our approach in discovering missing annotations and proper level ones for Wikipedia articles. Springer-Verlag} Berlin Heidelberg 2007."
Wannemacher, Klaus Articles as assignments - Modalities and experiences of wikipedia use in university courses 8th International Conference on Web Based Learning, ICWL 2009, August 19, 2009 - August 21, 2009 Aachen, Germany 2009 [275] In spite of perceived quality deficits, Wikipedia is a popular information resource among students. Instructors increasingly take advantage of the positive student attitude through actively integrating Wikipedia as a learning tool into university courses. The contribution raises the question if Wikipedia assignments in university courses are suited to make complex research, editing and bibliographic processes through which scholarship is produced transparent to students and to effectively improve their research and writing skills. 2009 Springer Berlin Heidelberg.
Wartena, Christian & Brussee, Rogier Topic detection by clustering keywords DEXA 2008, 19th International Conference on Database and Expert Systems Applications, September 1, 2008 - September 5, 2008 Turin, Italy 2008 [276] We consider topic detection without any prior knowledge of category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation on Wikipedia articles shows that clusters of keywords correlate strongly with the Wikipedia categories of the articles. In addition, we find that a distance measure based on the Jensen-Shannon} divergence of probability distributions outperforms the cosine similarity. In particular, a newly proposed term distribution taking co-occurrence of terms into account gives best results. ""
Wattenberg, Martin; Viegas, Fernanda B. & Hollenbach, Katherine Visualizing activity on wikipedia with chromograms 11th IFIP TC 13 International Conference on Human-Computer Interaction, INTERACT 2007, September 10, 2007 - September 14, 2007 Rio de Janeiro, Brazil 2007 To investigate how participants in peer production systems allocate their time, we examine editing activity on Wikipedia, the well-known online encyclopedia. To analyze the huge edit histories of the site's administrators we introduce a visualization technique, the chromogram, that can display very long textual sequences through a simple color coding scheme. Using chromograms we describe a set of characteristic editing patterns. In addition to confirming known patterns, such reacting to vandalism events, we identify a distinct class of organized systematic activities. We discuss how both reactive and systematic strategies shed light on self-allocation of effort in Wikipedia, and how they may pertain to other peer-production systems. IFIP} International Federation for Information Processing 2007.
Wee, Leong Chee & Hassan, Samer Exploiting Wikipedia for directional inferential text similarity International Conference on Information Technology: New Generations, ITNG 2008, April 7, 2008 - April 9, 2008 Las Vegas, NV, United states 2008 [277] In natural languages, variability of semantic expression refers to the situation where the same meaning can be inferred from different words or texts. Given that many natural language processing tasks nowadays (e.g. question answering, information retrieval, document summarization) often model this variability by requiring a specific target meaning to be inferred from different text variants, it is helpful to capture text similarity in a directional manner to serve such inference needs. In this paper, we show how Wikipedia can be used as a semantic resource to build a directional inferential similarity metric between words, and subsequently, texts. Through experiments, we show that our Wikipedia-based metric performs significantly better when applied to a standard evaluation dataset, with a reduction in error rate of 16.1\% over the random metric baseline. ""
Weikum, Gerhard Chapter 3: Search for knowledge 1st Workshop on Search Computing Challenges and Directions, SeCo 2009, June 17, 2009 - June 19, 2009 Como, Italy 2010 [278] There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. In addition, Semantic-Web-style} ontologies, structured Deep-Web} sources, and Social-Web} networks and tagging communities can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision. This vision and position paper discusses opportunities and challenges along this research avenue. The technical issues to be looked into include knowledge harvesting to construct large knowledge bases, searching for knowledge in terms of entities and relationships, and ranking the results of such queries. ""
Weikum, Gerhard Harvesting, searching, and ranking knowledge on the web 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, February 9, 2009 - February 12, 2009 Barcelona, Spain 2009 [279] There are major trends to advance the functionality of search engines to a more expressive semantic level (e.g., [2, 4, 6, 7, 8, 9, 13, 14, 18]). This is enabled by employing large-scale information extraction [1, 11, 20] of entities and relationships from semistructured as well as natural-language Web sources. In addition, harnessing Semantic-Web-style} ontologies [22] and reaching into Deep-Web} sources [16] can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision. This talk presents ongoing research towards this objective, with emphasis on our work on the YAGO} knowledge base [23, 24] and the NAGA} search engine [14] but also covering related projects. YAGO} is a large collection of entities and relational facts that are harvested from Wikipedia and WordNet} with high accuracy and reconciled into a consistent RDF-style} semantic" graph. For further growing YAGO} from Web sources while retaining its high quality pattern-based extraction is combined with logic-based consistency checking in a unified framework [25]. NAGA} provides graph-template-based search over this data with powerful ranking capabilities based on a statistical language model for graphs. Advanced queries and the need for ranking approximate matches pose efficiency and scalability challenges that are addressed by algorithmic and indexing techniques [15 17]. YAGO} is publicly available and has been imported into various other knowledge-management projects including DB-pedia.} YAGO} shares many of its goals and methodologies with parallel projects along related lines. These include Avatar [19] Cimple/DBlife} [10 21] DBpedia} [3] Know-ItAll/TextRunner} [12 5] Kylin/KOG} [26 27] and the Libra technology [18 28] (and more). Together they form an exciting trend towards providing comprehensive knowledge bases with semantic search capabilities. """
Weiping, Wang; Peng, Chen & Bowen, Liu A self-adaptive explicit semantic analysis method for computing semantic relatedness using wikipedia 2008 International Seminar on Future Information Technology and Management Engineering, FITME 2008, November 20, 2008 - November 20, 2008 Leicestershire, United kingdom 2008 [280] In recent years, the Explicit Semantic Analysis (ESA) method has got a good performance in computing semantic relatedness (SR).} However, ESA} method has failed to consider the given context of the word-pair, and generates the same semantic concepts for one word in different word-pairs. It can't exactly determine the intended sense of an ambiguous word. In this paper, we propose an improved method for computing semantic relatedness. Our technique, the Self-Adaptive} Explicit Semantic Analysis (SAESA), is unique in that it generates corresponding concepts to express the intended meaning for the word, according to the different words being compared and the different context. Experimental results on WordSimilarity-353} benchmark dataset show that the proposed method are superior to those of existing methods, the correlation of computed result with human judgment has an improvement from r = 0.74 to 0.81. ""
Welker, Andrea L. & Quintiliano, Barbara Information literacy: Moving beyond Wikipedia GeoCongress 2008: Geosustainability and Geohazard Mitigation, March 9, 2008 - March 12, 2008 New Orleans, LA, United states 2008 [281] In the past, finding information was the challenge. Today, the challenge our students face is to sift through and evaluate the incredible amount of information available. This ability to find and evaluate information is sometimes referred to as information literacy. Information literacy relates to a student's ability to communicate, but, more importantly, information literate persons are well-poised to learn throughout life because they have learned how to learn. A series of modules to address information literacy were created in a collaborative effort between faculty in the Civil and Environmental Engineering Department at Villanova and the librarians at Falvey Memorial Library. These modules were integrated throughout the curriculum, from sophomore to senior year. Assessment is based on modified ACRL} (Association} of College and Research Libraries) outcomes. This paper will document the lessons learned in the implementation of this program and provide concrete examples of how to incorporate information literacy into geotechnical engineering classes. Copyright ASCE} 2008.
West, Andrew G.; Kannan, Sampath & Lee, Insup Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata? 3rd European Workshop on System Security, EUROSEC'10, April 13, 2010 - April 13, 2010 Paris, France 2010 [282] Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language-processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust. In this paper, we leverage the spatio-temporal properties of revision metadata to detect vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with non-offending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags vandalism at performance comparable to the natural-language efforts we intend to complement (85\% accuracy at 50\% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside our labeled set. ""
Westerveid, Thijs; Rode, Henning; Os, Roel Van; Hiemstra, Djoerd; Ramirez, Georgina; Mihajlovie, Vojkan & Vries, Arjen P. De Evaluating structured information retrieval and multimedia retrieval using PF/Tijah 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany 2007 We used a flexible XML} retrieval system for evaluating structured document retrieval and multimedia retrieval tasks in the context of the INEX} 2006 benchmarks. We investigated the differences between article and element retrieval for Wikipedia data as well as the influence of an elements context on its ranking. We found that article retrieval performed well on many tasks and that pinpointing the relevant passages inside an article may hurt more than it helps. We found that for finding images in isolation the associated text is a very good descriptor in the Wikipedia collection, but we were not very succesful at identifying relevant multimedia fragments consisting of a combination of text and images. Springer-Verlag} Berlin Heidelberg 2007.
Winter, Judith & Kuhne, Gerold Achieving high precisions with peer-to-peer is possible 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [283] Until previously, centralized stand-alone solutions had no problem coping with the load of storing, indexing and searching the small test collections used for evaluating search results at INEX.} However, searching the new large-scale Wikipedia collection of 2009 requires much more resources such as processing power, RAM, and index space. It is hence more important than ever to regard efficiency issues when performing XML-Retrieval} tasks on such a big collection. On the other hand, the rich markup of the new collection is an opportunity to exploit the given structure and obtain a more efficient search. This paper describes our experiments using distributed search techniques based on XML-Retrieval.} Our aim is to improve both effectiveness and efficiency; we have thus submitted search results to both the Efficiency Track and the Ad Hoc Track. In our experiments, the collection, index, and search load are split over a peer-to-peer (P2P) network to gain more efficiency in terms of load balancing when searching large-scale collections. Since the bandwidth consumption between searching peers has to be limited in order to achieve a scalable, efficient system, we exploit XML-structure} to reduce the number of messages sent between peers. In spite of mainly aiming at efficiency, our search engine SPIRIX} resulted in quite high precisions and made it into the top-10 systems (focused task). It ranked 7 at the Ad Hoc Track (59\%) and came first in terms of precision at the Efficiency Track (both categories of topics). For the first time at INEX, a P2P} system achieved an official search quality comparable with the top-10 centralized solutions! 2010 Springer-Verlag} Berlin Heidelberg.
Witmer, Jeremy & Kalita, Jugal Extracting geospatial entities from Wikipedia ICSC 2009 - 2009 IEEE International Conference on Semantic Computing, September 14, 2009 - September 16, 2009 Berkeley, CA, United states 2009 [284] This paper addresses the challenge of extracting geospatial data from the article text of the English Wikipedia. In the first phase of our work, we create a training corpus and select a set of word-based features to train a Support Vector Machine (SVM) for the task of geospatial named entity recognition. We target for testing a corpus of Wikipedia articles about battles and wars, as these have a high incidence of geospatial content. The SVM} recognizes place names in the corpus with a very high recall, close to 100\%, with an acceptable precision. The set of geospatial NEs} is then fed into a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name. As many place names are ambiguous, and do not immediately geocode to a single location, we present a data structure and algorithm to resolve ambiguity based on sentence and article context, so the correct coordinates can be selected. We achieve an f-measure of 82\%, and create a set of geospatial entities for each article, combining the place names, spatial locations, and an assumed point geometry. These entities can enable geospatial search on and geovisualization of Wikipedia. ""
Wong, Wilson; Liu, Wei & Bennamoun, Mohammed Featureless similarities for terms clustering using tree-traversing ants International Symposium on Practical Cognitive Agents and Robots, PCAR 2006, November 27, 2006 - November 28, 2006 Perth, WA, Australia 2006 [285] Besides being difficult to scale between different domains and to handle knowledge fluctuations, the results of terms clustering presented by existing ontology engineering systems are far from desirable. In this paper, we propose a new version of ant-based method for clustering terms known as Tree-Traversing} Ants (TTA).} With the help of the Normalized Google Distance (NGD) and n of Wikipedia (nW) as measures for similarity and distance between terms, we attempt to achieve an adaptable clustering method that is highly scalable across domains. Initial experiments with two datasets show promising results and demonstrated several advantages that are not simultaneously present in standard ant-based and other conventional clustering methods. Copyright held by author.
Wongboonsin, Jenjira & Limpiyakorn, Yachai Wikipedia customization for organization's process asset management 2008 International Conference on Advanced Computer Theory and Engineering, ICACTE 2008, December 20, 2008 - December 22, 2008 Phuket, Thailand 2008 [286] Mature organizations typically establish various process assets served as standards for work operations in their units. Process assets include policies, guidelines, standard process definitions, life cycle models, forms and templates, etc. These assets are placed in a repository called Organization's Process Asset Library or OPAL.} Working in a project will then utilize these assets and tailor organizational standard processes to suit for individual project processes. This research proposed an approach to establishing an organization's process asset library by customizing open source software- Wikipedia. The system is called WikiOPAL.} CMMI} is used as the referenced process improvement model for the establishment of organization's process assets in this work. We also demonstrated that Wikipedia can be properly used as an approach for constructing a process asset library in the collaborative environment. ""
Woodley, Alan & Geva, Shlomo NLPX at INEX 2006 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany 2007 XML} information retrieval (XML-IR) systems aim to better fulfil users' information needs than traditional IR} systems by returning results lower than the document level. In order to use XML-IR} systems users must encapsulate their structural and content information needs in a structured query. Historically, these structured queries have been formatted using formal languages such as NEXI.} Unfortunately, formal query languages are very complex and too difficult to be used by experienced - let alone casual - users and are too closely bound to the underlying physical structure of the collection. INEX's} NLP} task investigates the potential of using natural language to specify structured queries. QUT} has participated in the NLP} task with our system NLPX} since its inception. Here, we discuss the changes we've made to NLPX} since last year, including our efforts to port NLPX} to Wikipedia. Second, we present the results from the 2006 INEX} track where NLPX} was the best performing participant in the Thorough and Focused tasks. Springer-Verlag} Berlin Heidelberg 2007.
Wu, Shih-Hung; Li, Min-Xiang; Yang, Ping-Che & Ku, Tsun Ubiquitous wikipedia on handheld device for mobile learning 6th IEEE International Conference on Wireless, Mobile and Ubiquitous Technologies in Education, WMUTE 2010, April 12, 2010 - April 16, 2010 Kaohsiung, Taiwan 2010 [287] The hand-held systems and the wireless Internet access are widely available in recent years. However, the mobile learning with web content is still inconvenient. For example, the information has not well organized and it is uneasy to surf on the small screen of handheld device. We propose a mobile system based on the content of Wikipedia. Wikipedia is a free content resource and has abundant contents of text and pictures. We use the Wikipedia wrapper that we developed before, to develop the mobile-learning interface of cross-language and cross-platform applications. Our system can present the content of Wikipedia on the small screens of PDA, and can be use on mobile learning. A teaching scenario of mobile learning during a museum visiting is discussed in this paper. ""
Xavier, Clarissa Castella & Lima, Vera Lucia Strube De Construction of a domain ontological structure from Wikipedia 7th Brazilian Symposium in Information and Human Language Technology, STIL 2009, September 8, 2009 - September 11, 2009 Sao Carlos, Sao Paulo, Brazil 2010 [288] Data extraction from Wikipedia for ontologies construction, enrichment and population is an emerging research field. This paper describes a study on automatic extraction of an ontological structure containing hyponymy and location relations from Wikipedia's Tourism category in Portuguese, illustrated with an experiment, and evaluation of its results. ""
Xu, Hongtao; Zhou, Xiangdong; Wang, Mei; Xiang, Yu & Shi, Baile Exploring Flickr's related tags for semantic annotation of web images ACM International Conference on Image and Video Retrieval, CIVR 2009, July 8, 2009 - July 10, 2009 Santorini Island, Greece 2009 [289] Exploring social media resources, such as Flickr and Wikipedia to mitigate the difficulty of semantic gap has attracted much attention from both academia and industry. In this paper, we first propose a novel approach to derive semantic correlation matrix from Flickr's related tags resource. We then develop a novel conditional random field model for Web image annotation, which integrates the keyword correlations derived from Flickr, and the textual and visual features of Web images into an unified graph model to improve the annotation performance. The experimental results on real Web image data set demonstrate the effectiveness of the proposed keyword correlation matrix and the Web image annotation approach. ""
Xu, Jinsheng; Yilmaz, Levent & Zhang, Jinghua Agent simulation of collaborative knowledge processing in Wikipedia 2008 Spring Simulation Multiconference, SpringSim'08, April 14, 2008 - April 17, 2008 Ottawa, ON, Canada 2008 [290] Wikipedia, a User Innovation Community (UIC), is becoming increasingly influential source of knowledge. The knowledge in Wikipedia is produced and processed collaboratively by UIC.} The results of this collaboration process present various seemingly complex patterns demonstrated by update history of different articles in Wikipedia. Agent simulation is a powerful method that is used to study the behaviors of complex systems of interacting and autonomous agents. In this paper, we study the collaborative knowledge processing in Wikipedia using a simple agent-based model. The proposed model considers factors including knowledge distribution among agents, number of agents, behavior of agents and vandalism. We use this model to explain content growth rate, number and frequency of updates, edit war and vandalism in Wikipedia articles. The results demonstrate that the model captures the important empirical aspects in collaborative knowledge processing in Wikipedia.
Yan, Ying; Wang, Chen; Zhou, Aoying; Qian, Weining; Ma, Li & Pan, Yue Efficient indices using graph partitioning in RDF triple stores 25th IEEE International Conference on Data Engineering, ICDE 2009, March 29, 2009 - April 2, 2009 Shanghai, China 2009 [291] With the advance of the Semantic Web, varying RDF} data were increasingly generated, published, queried, and reused via the Web. For example, the DBpedia, a community effort to extract structured data from Wikipedia articles, broke 100 million RDF} triples in its latest release. Initiated by Tim Berners-Lee, likewise, the Linking Open Data (LOD) project has published and interlinked many open licence datasets which consisted of over 2 billion RDF} triples so far. In this context, fast query response over such large scaled data would be one of the challenges to existing RDF} data stores. In this paper, we propose a novel triple indexing scheme to help RDF} query engine fast locate the instances within a small scope. By considering the RDF} data as a graph, we would partition the graph into multiple subgraph pieces and store them individually, over which a signature tree would be built up to index the URIs.} When a query arrives, the signature tree index is used to fast locate the partitions that might include the matches of the query by its constant URIs.} Our experiments indicate that the indexing scheme dramatically reduces the query processing time in most cases because many partitions would be early filtered out and the expensive exact matching is only performed over a quite small scope against the original dataset. ""
Yang, Jingjing; Li, Yuanning; Tian, Yonghong; Duan, Lingyu & Gao, Wen A new multiple kernel approach for visual concept learning 15th International Multimedia Modeling Conference, MMM 2009, January 7, 2009 - January 9, 2009 Sophia-Antipolis, France 2009 [292] In this paper, we present a novel multiple kernel method to learn the optimal classification function for visual concept. Although many carefully designed kernels have been proposed in the literature to measure the visual similarity, few works have been done on how these kernels really affect the learning performance. We propose a Per-Sample} Based Multiple Kernel Learning method (PS-MKL) to investigate the discriminative power of each training sample in different basic kernel spaces. The optimal, sample-specific kernel is learned as a linear combination of a set of basic kernels, which leads to a convex optimization problem with a unique global optimum. As illustrated in the experiments on the Caltech 101 and the Wikipedia MM} dataset, the proposed PS-MKL} outperforms the traditional Multiple Kernel Learning methods (MKL) and achieves comparable results with the state-of-the-art methods of learning visual concepts. 2008 Springer Berlin Heidelberg.
Yang, Kai-Hsiang; Chen, Chun-Yu; Lee, Hahn-Ming & Ho, Jan-Ming EFS: Expert finding system based on wikipedia link pattern analysis 2008 IEEE International Conference on Systems, Man and Cybernetics, SMC 2008, October 12, 2008 - October 15, 2008 Singapore, Singapore 2008 [293] Building an expert finding system is very important for many applications especially in the academic environment. Previous work uses e-mails or web pages as corpus to analyze the expertise for each expert. In this paper, we present an Expert Finding System, abbreviated as EFS} to build experts' profiles by using their journal publications. For a given proposal, the EFS} first looks up the Wikipedia web site to get relative link information, and then list and rank all associated experts by using those information. In our experiments, we use a real-world dataset which comprises of 882 people and 13,654 papers, and are categorized into 9 expertise domains. Our experimental results show that the EFS} works well on several expertise domains like Artificial} Intelligence" and {"Image} Pattern Recognition" etc. """
Yap, Poh-Hean; Ong, Kok-Leong & Wang, Xungai Business 2.0: A novel model for delivery of business services 5th International Conference on Service Systems and Service Management, ICSSSM'08, June 30, 2008 - July 2, 2008 Melbourne, Australia 2008 [294] Web 2.0, regardless of the exact definition, has proven to bring about significant changes to the way the Internet was used. Evident by key innovations such as Wikipedia, FaceBook, YouTube, and Blog sites, these community-based Website in which contents are generated and consumed by the same group of users are changing the way businesses operate. Advertisements are no longer 'forced' upon the viewers but are instead 'intelligently' targeted based on the contents of interest. In this paper, we investigate the concept of Web 2.0 in the context of business entities. We asked if Web 2.0 concepts could potentially lead to a change of paradigm or the way businesses operate today. We conclude with a discussion of a Web 2.0 application we recently developed that we think is an indication that businesses will ultimately be affected by these community-based technologies; thus bringing about Business 2.0 - a paradigm for businesses to cooperate with one another to deliver improved products and services to their own customers. ""
Yuan, Pingpeng; Wang, Guoyin; Zhang, Qin & Jin, Hai SASL: A semantic annotation system for literature International Conference on Web Information Systems and Mining, WISM 2009, November 7, 2009 - November 8, 2009 Shanghai, China 2009 [295] Due to ambiguity, search engines for scientific literatures may not return right search results. One efficient solution to the problems is to automatically annotate literatures and attach the semantic information to them. Generally, semantic annotation requires identifying entities before attaching semantic information to them. However, due to abbreviation and other reasons, it is very difficult to identify entities correctly. The paper presents a Semantic Annotation System for Literature (SASL), which utilizes Wikipedia as knowledge base to annotate literatures. SASL} mainly attaches semantic to terminology, academic institutions, conferences, and journals etc. Many of them are usually abbreviations, which induces ambiguity. Here, SASL} uses regular expressions to extract the mapping between full name of entities and their abbreviation. Since full names of several entities may map to a single abbreviation, SASL} introduces Hidden Markov Model to implement name disambiguation. Finally, the paper presents the experimental results, which confirm SASL} a good performance. ""
Zacharouli, Polyxeni; Titsias, Michalis & Vazirgiannis, Michalis Web page rank prediction with PCA and em clustering 6th International Workshop on Algorithms and Models for the Web-Graph, WAW 2009, February 12, 2009 - February 13, 2009 Barcelona, Spain 2009 [296] In this paper we describe learning algorithms for Web page rank prediction. We consider linear regression models and combinations of regression with probabilistic clustering and Principal Components Analysis (PCA).} These models are learned from time-series data sets and can predict the ranking of a set of Web pages in some future time. The first algorithm uses separate linear regression models. This is further extended by applying probabilistic clustering based on the EM} algorithm. Clustering allows for the Web pages to be grouped together by fitting a mixture of regression models. A different method combines linear regression with PCA} so as dependencies between different web pages can be exploited. All the methods are evaluated using real data sets obtained from Internet Archive, Wikipedia and Yahoo! ranking lists. We also study the temporal robustness of the prediction framework. Overall the system constitutes a set of tools for high accuracy pagerank prediction which can be used for efficient resource management by search engines. 2009 Springer Berlin Heidelberg.
Zhang, Congle; Xue, Gui-Rong & Yu, Yong Knowledge supervised text classification with no labeled documents 10th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2008, December 15, 2008 - December 19, 2008 Hanoi, Viet nam 2008 [297] In traditional text classification approaches, the semantic meanings of the classes are described by the labeled documents. Since labeling documents is often time consuming and expensive, it is a promising idea that asking users to provide some keywords to depict the classes, instead of labeling any documents. However, short pieces of keywords may not contain enough information and therefore may lead to unreliable classifier. Fortunately, there are large amount of public data easily available in web directories, such as ODP, Wikipedia, etc. We are interested in exploring the enormous crowd intelligence contained in such public data to enhance text classification. In this paper, we propose a novel text classification framework called Knowledge} Supervised Learning"(KSL) which utilizes the knowledge in keywords and the crowd intelligence to learn the classifier without any labeled documents. We design a two-stage risk minimization (TSRM) approach for the KSL} problem. It can optimize the expected prediction risk and build the high quality classifier. Empirical results verify our claim: our algorithm can achieve above 0.9 on Micro-F1} on average which is much better than baselines and even comparable against SVM} classifier supervised by labeled documents. 2008 Springer Berlin Heidelberg."
Zhang, Xu; Song, Yi-Cheng; Cao, Juan; Zhang, Yong-Dong & Li, Jin-Tao Large scale incremental web video categorization 1st International Workshop on Web-Scale Multimedia Corpus, WSMC'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09, October 19, 2009 - October 24, 2009 Beijing, China 2009 [298] With the advent of video sharing websites, the amount of videos on the internet grows rapidly. Web video categorization is an efficient methodology for organizing the huge amount of videos. In this paper we investigate the characteristics of web videos, and make two contributions for the large scale incremental web video categorization. First, we develop an effective semantic feature space Concept Collection for Web Video with Categorization Distinguishability (CCWV-CD), which is consisted of concepts with small semantic gap, and the concept correlations are diffused by a novel Wikipedia Propagation (WP) method. Second, we propose an incremental support vector machine with fixed number of support vectors (n-ISVM) for large scale incremental learning. To evaluate the performance of CCWV-CD, WP} and N-ISVM, we conduct extensive experiments on the dataset of 80,021 most representative videos on a video sharing website. The experiment results show that the CCWV-CD} and WP} is more representative for web videos, and the N-ISVM} algorithm greatly improves the efficiency in the situation of incremental learning. ""
Zhang, Yi; Sun, Aixin; Datta, Anwitaman; Chang, Kuiyu & Lim, Ee-Peng Do wikipedians follow domain experts?: A domain-specific study on wikipedia knowledge building 10th Annual Joint Conference on Digital Libraries, JCDL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia 2010 [299] Wikipedia is one of the most successful online knowledge bases, attracting millions of visits daily. Not surprisingly, its huge success has in turn led to immense research interest for a better understanding of the collaborative knowledge building process. In this paper, we performed a (terrorism) domain-specific case study, comparing and contrasting the knowledge evolution in Wikipedia with a knowledge base created by domain experts. Specifically, we used the Terrorism Knowledge Base (TKB) developed by experts at MIPT.} We identified 409 Wikipedia articles matching TKB} records, and went ahead to study them from three aspects: creation, revision, and link evolution. We found that the knowledge building in Wikipedia had largely been independent, and did not follow TKB} - despite the open and online availability of the latter, as well as awareness of at least some of the Wikipedia contributors about the TKB} source. In an attempt to identify possible reasons, we conducted a detailed analysis of contribution behavior demonstrated by Wikipedians. It was found that most Wikipedians contribute to a relatively small set of articles each. Their contribution was biased towards one or very few article(s). At the same time, each article's contributions are often championed by very few active contributors including the article's creator. We finally arrive at a conjecture that the contributions in Wikipedia are more to cover knowledge at the article level rather than at the domain level. ""
Zhou, Zhi; Tian, Yonghong; Li, Yuanning; Huang, Tiejun & Gao, Wen Large-scale cross-media retrieval of wikipediaMM images with textual and visual query expansion 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [300] In this paper, we present our approaches for the WikipediaMM} task at ImageCLEF} 2008. We first experimented with a text-based image retrieval approach with query expansion, where the extension terms were automatically selected from a knowledge base that was semi-automatically constructed from Wikipedia. Encouragingly, the experimental results rank in the first place among all submitted runs. We also implemented a content-based image retrieval approach with query-dependent visual concept detection. Then cross-media retrieval was successfully carried out by independently applying the two meta-search tools and then combining the results through a weighted summation of scores. Though not submitted, this approach outperforms our text-based and content-based approaches remarkably. 2009 Springer Berlin Heidelberg.
Zirn, Cacilia; Nastase, Vivi & Strube, Michael Distinguishing between instances and classes in the wikipedia taxonomy 5th European Semantic Web Conference, ESWC 2008, June 1, 2008 - June 5, 2008 Tenerife, Canary Islands, Spain 2008 [301] This paper presents an automatic method for differentiating between instances and classes in a large scale taxonomy induced from the Wikipedia category network. The method exploits characteristics of the category names and the structure of the network. The approach we present is the first attempt to make this distinction automatically in a large scale resource. In contrast, this distinction has been made in WordNet} and Cyc based on manual annotations. The result of the process is evaluated against ResearchCyc.} On the subnetwork shared by our taxonomy and ResearchCyc} we report 84.52\% accuracy. 2008 Springer-Verlag} Berlin Heidelberg.
Focused Retrieval and Evaluation - 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, Revised and Selected Papers 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 The proceedings contain 42 papers. The topics discussed include: is there something quantum-like about the human mental lexicon?; supporting for real-world tasks: producing summaries of scientific articles tailored to the citation context; semantic document processing using wikipedia as a knowledge base; a methodology for producing improved focused elements; use of language model, phrases and wikipedia forward links for INEX} 2009; combining language models with NLP} and interactive query expansion; exploiting semantic tags in XML} retrieval; the book structure extraction competition with the resurgence software at Caen university; ranking and fusion approaches for XML} book retrieval; index tuning for efficient proximity-enhanced query processing; fast and effective focused retrieval; combining term-based and category-based representations for entity search; and focused search in books and wikipedia: categories, links and relevance feedback.
IEEE Pacific Visualization Symposium 2010, PacificVis 2010 - Proceedings IEEE Pacific Visualization Symposium 2010, PacificVis 2010, March 2, 2010 - March 5, 2010 Taipei, Taiwan 2010 The proceedings contain 27 papers. The topics discussed include: quantitative effectiveness measures for direct volume rendered images; shape-based transfer functions for volume visualization; volume visualization based on statistical transfer-function spaces; volume exploration using ellipsoidal Gaussian transfer functions; stack zooming for multi-focus interaction in time-series data visualization; a layer-oriented interface for visualizing time-series data from oscilloscopes; wikipediaviz: conveying article quality for casual wikipedia readers; caleydo: design and evaluation of a visual analysis framework for gene expression data in its biological context; visualizing field-measured seismic data; seismic volume visualization for horizon extraction; and verification of the time evolution of cosmological simulations via hypothesis-driven comparative and quantitative visualization.
JCDL'10 - Digital Libraries - 10 Years Past, 10 Years Forward, a 2020 Vision 10th Annual Joint Conference on Digital Libraries, JCDL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia 2010 The proceedings contain 66 papers. The topics discussed include: making web annotations persistent over time; transferring structural markup across translations using multilingual alignment and projection; scholarly paper recommendation via user's recent research interests; effective self-training author name disambiguation in scholarly digital libraries; evaluating methods to rediscover missing web pages from the web infrastructure; exploiting time-based synonyms in searching document archives; using word sense discrimination on historic document collections; Chinese calligraphy specific style rendering system; do wikipedians follow domain experts? a domain-specific study on wikipedia knowledge building; crowdsourcing the assembly of concept hierarchies; a user-centered design of a personal digital library for music exploration; and improving mood classification in music digital libraries by combining lyrics and audio.
Proceedings of the 6th Workshop on Geographic Information Retrieval, GIR'10 6th Workshop on Geographic Information Retrieval, GIR'10, February 18, 2010 - February 19, 2010 Zurich, Switzerland 2010 The proceedings contain 24 papers. The topics discussed include: linkable geographic ontologies; unnamed locations, underspecified regions, and other linguistic phenomena in geographic annotation; an ontology of place and service types to facilitate place-affordance geographic information retrieval; Geotagging: using proximity, sibling, and prominence clues to understand comma groups; evaluation of georeferencing; a GIR} architecture with semantic-flavored query reformulation; OGC} catalog service for heterogeneous earth observation metadata using extensible search indices; TWinner:} understanding news queries with geo-content using Twitter; geographical classification of documents using evidence from Wikipedia; a web platform for the evaluation of vernacular place names in automatically constructed gazetteers; grounding toponyms in an Italian local news corpus; and using the geographic scopes of web documents for contextual advertising.
2009 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2009 2009 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2009, November 11, 2009 - November 14, 2009 Washington, DC, United states 2009 The proceedings contain 68 papers. The topics discussed include: multi-user multi-account interaction in groupware supporting single-display collaboration; supporting collaborative work through flexible process execution; dynamic data services: data access for collaborative networks in a multi-agent systems architecture; integrating external user profiles in collaboration applications; a collaborative framework for enforcing server commitments, and for regulating server interactive behavior in SOA-based} systems; CASTLE:} a social framework for collaborative anti-phishing databases; VisGBT:} visually analyzing evolving datasets for adaptive learning; an IT} appliance for remote collaborative review of mechanisms of injury to children in motor vehicle crashes; user contribution and trust in Wikipedia; and a new perspective on experimental analysis of N-tier systems: evaluating database scalability, multi-bottlenecks, and economical operation.
Internet and Other Electronic Resources for Materials Education 2007 136th TMS Annual Meeting, 2007, Febrary 25, 2007 - March 1, 2007 Orlando, FL, United states 2007 The proceedings contain 1 papers. The topics discussed include: Wikipedia in materials education.
Natural Language Processing and Information Systems - 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, Proceedings 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, June 27, 2007 - June 29, 2007 Paris, France 2007 The proceedings contain 42 papers. The topics discussed include: an alternative approach to tagging; an efficient denotational semantics for natural language database queries; developing methods and heuristics with low time complexities for filtering spam messages; exploit semantic information for category annotation recommendation in wikipedia; a lightweight approach to semantic annotation of research papers; a new text clustering method using hidden markov model; identifying event sequences using hidden markov model; selecting labels for news document clusters; generating ontologies via language components and ontology reuse; experiences using the researchcyc upper level ontology; ontological text mining of software documents; treatment of passive voice and conjunctions in use case documents; and natural language processing and the conceptual model self-organizing map; and automatic issue extraction from a focused dialogue.
Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management, WIDM '07, Co-located with the 16th ACM Conference on Information and Knowledge Management, CIKM '07 9th Annual ACM International Workshop on Web Information and Data Management, WIDM '07, Co-located with the 16th ACM Conference on Information and Knowledge Management, CIKM '07, November 6, 2007 - November 9, 2007 Lisboa, Portugal 2007 The proceedings contain 20 papers. The topics discussed include: evaluation of datalog extended with an XPath predicate; data allocation scheme based on term weight for P2P information retrieval; distributed monitoring of peer to peer systems; self-optimizing block transfer in web service grids; supporting personalized top-k skyline queries using partial compressed skycube; toward editable web browser: edit-and-propagate operation for web browsing; mining user navigation patterns for personalizing topic directories; an online PPM} prediction model for web prefetching; extracting the discussion structure in comments on news-articles; pattern detection from web using AFA set theory; using neighbors to date web documents; on improving wikipedia search using article quality; and SATYA: a reputation-based approach for service discovery and selection in service oriented architectures.
Tamagawa, Susumu; Sakurai, Shinya; Tejima, Takuya; Morita, Takeshi; Izumi, Noriaki & Yamaguchi, Takahira Learning a Large Scale of Ontology from Japanese Wikipedia Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010 Here is discussed how to learn a large scale of ontology from Japanese Wikipedia. The learned ontology includes the following properties: Rdfs:subClassOf} (IS-A} relationships), rdf:type (class-instance relationships), Owl:Object/DatatypeProperty} (Infobox} triples), rdfs:domain (property domains), and Skos:altLabel} (synonyms). Experimental case studies show us that the learned Japanese Wikipedia Ontology goes better than already existing general linguistic ontologies, such as EDR} and Japanese WordNet, from the points of building costs and structure information richness.
Jing, Liping; Yun, Jiali; Yu, Jian & Huang, Houkuan Text Clustering via Term Semantic Units Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010 How best to represent text data is an important problem in text mining tasks including information retrieval, clustering, classification and etc.. In this paper, we proposed a compact document representation with term semantic units which are identified from the implicit and explicit semantic information. Among it, the implicit semantic information is extracted from syntactic content via statistical methods such as latent semantic indexing and information bottleneck. The explicit semantic information is mined from the external semantic resource (Wikipedia).} The proposed compact representation model can map a document collection in a low-dimension space (term semantic units which are much less than the number of all unique terms). Experimental results on real data sets have shown that the compact representation efficiently improve the performance of text clustering.
Breuing, Alexa Improving Human-Agent Conversations by Accessing Contextual Knowledge from Wikipedia Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010 In order to talk to each other meaningfully, conversational partners utilize different types of conversational knowledge. Due to the fact that speakers often use grammatically incomplete and incorrect sentences in spontaneous language, knowledge about conversational and terminological context turns out to be as much important in language understanding as traditional linguistic analysis. In the context of the KnowCIT} project we want to improve human-agent conversations by connecting the agent to an adequate representation of such contextual knowledge drawn from the online encyclopedia Wikipedia. Thereby we make use of additional components provided by Wikipedia which goes beyond encyclopedical information to identify the current dialog topic and to implement human like look-up abilities.
Salahli, M.A.; Gasimzade, T.M. & Guliyev, A.I. Domain specific ontology on computer science Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control, 2009. ICSCCW 2009. Fifth International Conference on 2009 In this paper we introduce the application system based on the domain specific ontology. Some design problems of the ontology are discussed. The ontology is based on the WordNet's} database and consists of Turkish and English terms on computer science and informatics. Second we present the method for determining a set of words, which are related to a given concept and computing the degree of semantic relatedness between them. The presented method has been used for semantic searching process, which is carried out by our application.
Yang, Kai-Hsiang; Kuo, Tai-Liang; Lee, Hahn-Ming & Ho, Jan-Ming A Reviewer Recommendation System Based on Collaborative Intelligence Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on 2009 In this paper, expert-finding problem is transformed to a classification issue. We build a knowledge database to represent the expertise characteristic of domain from web information constructed by collaborative intelligence, and an incremental learning method is proposed to update the database. Furthermore, results are ranked by measuring the correlation in the concept network from online encyclopedia. In our experiments, we use the real world dataset which comprise 2,701 experts who are categorized into 8 expertise domains. Our experimental results show that the expertise knowledge extracted from collaborative intelligence can improve efficiency and effect of classification and increase the precision of ranking expert at least 20\%.
Mishra, Surjeet; Gorai, Amarendra; Oberoi, Tavleen & Ghosh, Hiranmay Efficient Visualization of Content and Contextual Information of an Online Multimedia Digital Library for Effective Browsing Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010 In this paper, we present a few innovative techniques for visualization of content and contextual information of a multimedia digital library for effective browsing. A traditional collection visualization portal often depicts some metadata or a short synopsis, which is quite inadequate for assessing the documents. We have designed a novel web portal that incorporates a few preview facilities to disclose an abstract of the contents. Moreover, we place the documents on Google Maps to make its geographical context explicit. A semantic network, created automatically around the collection, brings out other contextual information from external knowledge resources like Wikipedia which is used for navigating collection. This paper also reports economical hosting techniques using Amazon Cloud.
Jinwei, Fu; Jianhong, Sun & Tianqing, Xiao A FAQ online system based on wiki E-Health Networking, Digital Ecosystems and Technologies (EDT), 2010 International Conference on 2010 In this paper, we will propose a FAQ} online system based on wiki engine. The goal of this system is to reduce the counseling workload in our university. It is also can be used in other counseling field. The proposed system will be built based on one of the popular wiki engines, TikiWiki.} Actually, the function of the proposed system has gone far beyond the FAQ-platform} functionality in practical application process, due to wiki wonderful concept and characteristics.
Martins, A.; Rodrigues, E. & Nunes, M. Information repositories and learning environments: Creating spaces for the promotion of virtual literacy and social responsibility International Association of School Librarianship. Selected Papers from the ... Annual Conference 2007 [302] Information repositories are collections of digital information which can be built in several different ways and with different purposes. They can be collaborative and with a soft control of the contents and authority of the documents, as well as directed to the general public (Wikipedia} is an example of this). But they can also have a high degree of control and be conceived in order to promote literacy and responsible learning, as well as directed to special groups of users like, for instance, school students. In the new learning environments built upon digital technologies, the need to promote quality information resources that can support formal and informal e-learning emerges as one of the greatest challenges that school libraries have to face. It is now time that school libraries, namely through their regional and national school library networks, start creating their own information repositories, oriented for school pupils and directed to their specific needs of information and learning. The creation of these repositories implies a huge work of collaboration between librarians, school teachers, pupils, families and other social agents that interact within the school community, which is, in itself, a way to promote cooperative learning and social responsibility between all members of such communities. In our presentation, we will discuss the bases and principles that are behind the construction of the proposed information repositories and learning platforms as well as the need for a constant dialogue between technical and content issues.
Lucchese, C.; Orlando, S.; Perego, R.; Silvestri, F. & Tolomei, G. Detecting Task-Based Query Sessions Using Collaborative Knowledge Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010 Our research challenge is to provide a mechanism for splitting into user task-based sessions a long-term log of queries submitted to a Web Search Engine (WSE).} The hypothesis is that some query sessions entail the concept of user task. We present an approach that relies on a centroid-based and a density-based clustering algorithm, which consider queries inter-arrival times and use a novel distance function that takes care of query lexical content and exploits the collaborative knowledge collected by Wiktionary and Wikipedia.
Cover Art Computational Aspects of Social Networks, 2009. CASON '09. International Conference on 2009 The following topics are dealt with: online social network; pattern clustering; Web page content; Wikipedia article; learning management system; Web database descriptor; genetic algorithm; face recognition; interactive robotics; and security of data.
Liu, Lei & Tan, Pang-Ning A Framework for Co-classification of Articles and Users in Wikipedia Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010 The massive size of Wikipedia and the ease with which its content can be created and edited has made Wikipedia an interesting domain for a variety of classification tasks, including topic detection, spam detection, and vandalism detection. These tasks are typically cast into a link-based classification problem, in which the class label of an article or a user is determined from its content-based and link-based features. Prior works have focused primarily on classifying either the editors or the articles (but not both). Yet there are many situations in which the classification can be aided by knowing collectively the class labels of the users and articles (e.g., spammers are more likely to post spam content than non-spammers). This paper presents a novel framework to jointly classify the Wikipedia articles and editors, assuming there are correspondences between their classes. Our experimental results demonstrate that the proposed co-classification algorithm outperforms classifiers that are trained independently to predict the class labels of articles and editors.
Ohmori, K. & Kunii, T.L. Author Index Cyberworlds, 2007. CW '07. International Conference on 2007 The mathematical structure of cyberworlds is clarified based on the duality of homology lifting property and homotopy extension property. The duality gives bottom-up and top-down methods to model, design and analyze the structure of cyberworlds. The set of homepages representing a cyberworld is transformed into a state finite machine. In development of the cyberworld, a sequence of finite state machines is obtained. This sequence has homotopic property. This property is clarified to map a finite state machine to a simplicial complex. Wikipedia, bottom-up network construction and top-down network analysis are described as examples.
Missen, M.M.S. & Boughanem, M. Sentence-Level Opinion-Topic Association for Opinion Detection in Blogs Advanced Information Networking and Applications Workshops, 2009. WAINA '09. International Conference on 2009 The Opinion Detection from blogs has always been a challenge for researchers. One of the challenges faced is to find such documents that specifically contain opinion on users' information need. This requires text processing on sentence level rather than on document level. In this paper, we have proposed an opinion detection approach. The proposed approach tries to tackle opinion detection problem by using some document level heuristics and processing documents on sentence level using different semantic similarity relations of WordNet} between sentence words and list of weighted query terms expanded through encyclopedia Wikipedia. According to initial results, our approach performs well with MAP} of 0.2177 with improvement of 28.89\% over baseline results obtained through BM25} matching formula. TREC} Blog 2006 data is used as test data collection.
Baeza-Yates, R. Keynote Speakers Web Congress, 2009. LE-WEB '09. Latin American 2009 There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show several applications of mining the wisdom of crowds behind UGC} to improve search. These results not only impact the search performance but also the user interface, suggesting new ways of interaction. We will show live demos that find relations in the Wikipedia or improve image search, already available at sandbox.yahoo.com, the demo site of Yahoo! Research. Our final goal is to produce a virtuous data feedback circuit to leverage the Web itself.
Alemzadeh, Milad & Karray, Fakhri An Efficient Method for Tagging a Query with Category Labels Using Wikipedia towards Enhancing Search Engine Results Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010 This paper intends to present a straightforward, extensive, and noise resistant method for efficiently tagging a web query, submitted to a search engine, with proper category labels. These labels are intended to represent the closest categories related to the query which can ultimately be used to enhance the results of any typical search engine by either restricting the results to matching categories or enriching the query itself. The presented method effectively rules out noise words within a query, forms the optimal keyword packs using a density function, and returns a set of category labels which represent the common topics of the given query using Wikipedia category hierarchy.
Indrie, Sergiu & Groza, Adrian Towards social argumentative machines Intelligent Computer Communication and Processing (ICCP), 2010 IEEE International Conference on 2010 This research advocates the idea of combining argumentation theory with the social web technology, aiming to enact large scale or mass argumentation. The proposed framework allows mass-collaborative editing of structured arguments in the style of semantic wikipedia. The Argnet system was developed based on the Semantic MediaWiki} framework and on the Argument Interchange Format ontology.
Liu, Ming-Chi; Wen, Dunwei; Kinshuk & Huang, Yueh-Min Learning Animal Concepts with Semantic Hierarchy-Based Location-Aware Image Browsing and Ecology Task Generator Wireless, Mobile and Ubiquitous Technologies in Education (WMUTE), 2010 6th IEEE International Conference on 2010 This study firstly notices that lack of overall ecologic knowledge structure is one critical reason for learners' failure of keyword search. Therefore in order to identify their current interesting sight, the dynamic location-aware and semantic hierarchy (DLASH) is presented for learners to browse images. This hierarchy mainly considers that plant and animal species are discontinuously distributed around the planet, hence this hierarchy combines location information for constructing the semantic hierarchy through WordNet.} After learners confirmed their intent information needs, this study also provides learners three kinds of image-based learning tasks to learn: similar-images comparison, concept map fill-out and placement map fill-out. These tasks are designed based on Ausubel's advance organizers and improved it by integrating three new properties: Displaying the nodes of the concepts by authentic images, automatically generating the knowledge structure by computer and interactively integrating new and old knowledge.
Takemoto, M.; Yokohata, Y.; Tokunaga, T.; Hamada, M. & Nakamura, T. Demo: Implementation of Information-Provision Service with Smart Phone and Field Trial in Shopping Area Mobile and Ubiquitous Systems: Networking \& Services, 2007. MobiQuitous 2007. Fourth Annual International Conference on 2007 To achieve the information-provision service, we adopted the social network concept (http://en.wikipedia.org/wiki/Social\_network\_service), which handles human relationships in networks. We have implemented the information recommendation mechanism, by which users may obtain suitable information from the system based on relationships with other users in the social network service. We believe that information used by people should be handled based on their behavior. We have developed an information-provision service based on our platform. We have been studying and developing the service coordination and provision architecture - ubiquitous service-oriented network (USON) (Takemoto} et al., 2002) - for services in ubiquitous computing environments. We have developed an information-provision service using the social network service based on USON} architecture. This demonstration shows the implementation of the information-provision system with the actual information which was used in the field trial.
Ayyasamy, Ramesh Kumar; Tahayna, Bashar; Alhashmi, Saadat; gene, Siew Eu & Egerton, Simon Mining Wikipedia Knowledge to improve document indexing and classification Information Sciences Signal Processing and their Applications (ISSPA), 2010 10th International Conference on 2010 Weblogs are an importan source of information that requires automatic techniques to categorize them into “topic-based‿ content, to facilitate their future browsing and retrieval. In this paper we propose and illustrate the effectiveness of a new tf. idf measure. The proposed Conf.idf, Catf.idf measures are solely based on the mapping of terms-to-concepts-to-categories (TCONCAT) method that utilizes Wikipedia. The Knowledge Base-Wikipedia} is considered as a large scale Web encyclopaedia, that has high-quality and huge number of articles and categorical indexes. Using this system, our proposed framework consists of two stages to solve weblog classification problem. The first stage is to find out the terms belonging to a unique concept (article), as well as to disambiguate the terms belonging to more than one concept. The second stage is the determination of the categories to which these found concepts belong to. Experimental result confirms that, proposed system can distinguish the weblogs that belongs to more than one category efficiently and has a better performance and success than the traditional statistical Natural Language Processing-NLP} approaches.
Malone, T.W. Collective intelligence Collaborative Technologies and Systems, 2007. CTS 2007. International Symposium on 2007 While people have talked about collective intelligence for decades, new communication technologies - especially the Internet - now allow huge numbers of people all over the planet to work together in new ways. The recent successes of systems like Google and Wikipedia suggest that the time is now ripe for many more such systems, and this talk will examine ways to take advantage of these possibilities. Using examples from business, government, and other areas, the talk will address the fundamental question: How can people and computers be connected so that - collectively - they act more intelligently than any individuals, groups, or computers have ever done before?
Zeng, Honglei; Alhossaini, Maher A.; Fikes, Richard & McGuinness, Deborah L. Mining Revision History to Assess Trustworthiness of Article Fragments Collaborative Computing: Networking, Applications and Worksharing, 2006. CollaborateCom 2006. International Conference on 2006 Wikis are a type of collaborative repository system that enables users to create and edit shared content on the Web. The popularity and proliferation of Wikis have created a new set of challenges for trust research because the content in a Wiki can be contributed by a wide variety of users and can change rapidly. Nevertheless, most Wikis lack explicit trust management to help users decide how much they should trust an article or a fragment of an article. In this paper, we investigate the dynamic nature of revisions as we explore ways of utilizing revision history to develop an article fragment trust model. We use our model to compute trustworthiness of articles and article fragments. We also augment Wikis with a trust view layer with which users can visually identify text fragments of an article and view trust values computed by our model
qdah, Majdi Al & Falzi, Aznan An Educational Game for School Students World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [303]
Abrial, J. -R. & Hoang, Thai Son Using Design Patterns in Formal Methods: An Event-B Approach Proceedings of the 5th international colloquium on Theoretical Aspects of Computing 2008 [304] {Emphasis Motivation.Emphasis Formal Methods users are given sophisticated languages and tools for constructing models of complex systems. But quite often they lack some systematic methodological approaches which could help them. The goal of introducing design patterns within formal methods is precisely to bridge this gap. Emphasis A design pattern Emphasis is a general reusable solution to a commonly occurring problem in (software) design . . . It is a description or template for how to solve a problem that can be used in many different situations (Wikipedia on "Design Pattern").
Adafre, Sisay Fissaha & de Rijke, Maarten Discovering missing links in Wikipedia Proceedings of the 3rd international workshop on Link discovery 2005 [305] In this paper we address the problem of discovering missing hypertext links in Wikipedia. The method we propose consists of two steps: first, we compute a cluster of highly similar pages around a given page, and then we identify candidate links from those similar pages that might be missing on the given page. The main innovation is in the algorithm that we use for identifying similar pages, LTRank, which ranks pages using co-citation and page title information. Both LTRank} and the link discovery method are manually evaluated and show acceptable results, especially given the simplicity of the methods and conservativeness of the evaluation criteria.
Adams, Catherine Learning Management Systems as sites of surveillance, control, and corporatization: A review of the critical literature Society for Information Technology \& Teacher Education International Conference 2010 [306]
Al-Senaidi, Said Integrating Web 2.0 in Technology based learning environment World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [307]
Allen, Matthew Authentic Assessment and the Internet: Contributions within Knowledge Networks World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [308]
Allen, Nancy; Alnaimi, Tarfa Nasser & Lubaisi, Huda Ak Leadership for Technology Adoption in a Reform Community World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [309]
Allen, R.B. & Nalluru, S. Exploring history with narrative timelines Human Interface and the Management of Information. Designing Information Environments. Symposium on Human Interface 2009, 19-24 July 2009 Berlin, Germany 2009 [310] We develop novel timeline interfaces which separate the events in timelines into threads and then allow users to select among them. This interface is illustrated with five threads describing the causes of the American Civil War. In addition to selecting each of the threads, the sequence of events it describes can be played. That is, the user can step through the sequence of events and get a description of each event in the context of its thread. In addition, many of the events have links to more focused timelines and to external resources such as Wikipedia.
Amin, Mohammad Shafkat; Bhattacharjee, Anupam & Jamil, Hasan Wikipedia driven autonomous label assignment in wrapper induced tables with missing column names Proceedings of the 2010 ACM Symposium on Applied Computing 2010 [311] As the volume of information available on the internet is growing exponentially, it is clear that most of this information will have to be processed and digested by computers to produce useful information for human consumption. Unfortunately, most web contents are currently designed for direct human consumption in which it is assumed that a human will decipher the information presented to him in some context and will be able to connect the missing dots, if any. In particular, information presented in some tabular form often does not accompany descriptive titles or column names similar to attribute names in tables. While such omissions are not really an issue for humans, it is truly hard to extract information in autonomous systems in which a machine is expected to understand the meaning of the table presented and extract the right information in the context of the query. It is even more difficult when the information needed is distributed across the globe and involve semantic heterogeneity. In this paper, our goal is to address the issue of how to interpret tables with missing column names by developing a method for the assignment of attributes names in an arbitrary table extracted from the web in a fully autonomous manner. We propose a novel approach by leveraging Wikipedia for the first time for column name discovery for the purpose of table annotation. We show that this leads to an improved likelihood of capturing the context and interpretation of the table accurately and producing a semantically meaningful query response.
Ammann, Alexander & Matthies, Herbert K. K-Space DentMed/Visual Library: Generating and Presenting Dynamic Knowledge Spaces for Dental Research, Education, Clinical and Laboratory Practice World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [312]
Anderka, M.; Lipka, N. & Stein, B. Evaluating cross-language explicit semantic analysis and cross querying Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [313] This paper describes our participation in the TEL@CLEF} task of the CLEF} 2009 ad-hoc track. The task is to retrieve items from various multilingual collections of library catalog records, which are relevant to a user's query. Two different strategies are employed: (i) the Cross-Language} Explicit Semantic Analysis, CL-ESA, where the library catalog records and the queries are represented in a multilingual concept space that is spanned by aligned Wikipedia articles, and, (ii) a Cross Querying approach, where a query is translated into all target languages using Google Translate and where the obtained rankings are combined. The evaluation shows that both strategies outperform the monolingual baseline and achieve comparable results. Furthermore, inspired by the Generalized Vector Space Model we present a formal definition and an alternative interpretation of the CL-ESA} model. This interpretation is interesting for real-world retrieval applications since it reveals how the computational effort for CL-ESA} can be shifted from the query phase to a preprocessing phase.
Angel, Albert; Lontou, Chara; Pfoser, Dieter & Efentakis, Alexandros Qualitative geocoding of persistent web pages Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems 2008 [314] Information and specifically Web pages may be organized, indexed, searched, and navigated using various metadata aspects, such as keywords, categories (themes), and also space. While categories and keywords are up for interpretation, space represents an unambiguous aspect to structure information. The basic problem of providing spatial references to content is solved by geocoding; a task that relates identifiers in texts to geographic co-ordinates. This work presents a methodology for the semiautomatic geocoding of persistent Web pages in the form of collaborative human intervention to improve on automatic geocoding results. While focusing on the Greek language and related Web pages, the developed techniques are universally applicable. The specific contributions of this work are (i) automatic geocoding algorithms for phone numbers, addresses and place name identifiers and (ii) a Web browser extension providing a map-based interface for manual geocoding and updating the automatically generated results. With the geocoding of a Web page being stored as respective annotations in a central repository, this overall mechanism is especially suited for persistent Web pages such as Wikipedia. To illustrate the applicability and usefulness of the overall approach, specific geocoding examples of Greek Web pages are presented.
Anma, Fumihiko & Okamoto, Toshio Development of a Participatory Learning Support System based on Social Networking Service World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [315]
Antin, Judd & Cheshire, Coye Readers are not free-riders: reading as a form of participation on wikipedia Proceedings of the 2010 ACM conference on Computer supported cooperative work 2010 [316] The success of Wikipedia as a large-scale collaborative effort has spurred researchers to examine the motivations and behaviors of Wikipedia's participants. However, this research has tended to focus on active involvement rather than more common forms of participation such as reading. In this paper we argue that Wikipedia's readers should not all be characterized as free-riders -- individuals who knowingly choose to take advantage of others' effort. Furthermore, we illustrate how readers provide a valuable service to Wikipedia. Finally, we use the notion of legitimate peripheral participation to argue that reading is a gateway activity through which newcomers learn about Wikipedia. We find support for our arguments in the results of a survey of Wikipedia usage and knowledge. Implications for future research and design are discussed.
Anzai, Yayoi Digital Trends among Japanese University Students: Focusing on Podcasting and Wikis World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [317]
Anzai, Yayoi Interactions as the key for successful Web 2.0 integrated language learning: Interactions in a planetary community World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [318]
Anzai, Yayoi Introducing a Wiki in EFL Writing Class World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [319]
Aoki, Kumiko & Molnar, Pal International Collaborative Learning using Web 2.0: Learning of Foreign Language and Intercultural Understanding Global Learn Asia Pacific 2010 [320]
Arney, David Cooperative e-Learning and other 21st Century Pedagogies World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [321]
Ashraf, Bill Teaching the Google–Eyed YouTube Generation World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [322]
Atkinson, Tom Cell-Based Learning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [323]
Auer, Sören & Lehmann, Jens What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content Proceedings of the 4th European conference on The Semantic Web: Research and Applications 2007 [324] Wikis are established means for the collaborative authoring, versioning and publishing of textual articles. The Wikipedia project, for example, succeeded in creating the by far largest encyclopedia just on the basis of a wiki. Recently, several approaches have been proposed on how to extend wikis to allow the creation of structured and semantically enriched content. However, the means for creating semantically enriched structured content are already available and are, although unconsciously, even used by Wikipedia authors. In this article, we present a method for revealing this structured content by extracting information from template instances. We suggest ways to efficiently query the vast amount of extracted information (e.g. more than 8 million RDF} statements for the English Wikipedia version alone), leading to astonishing query answering possibilities (such as for the title question). We analyze the quality of the extracted content, and propose strategies for quality improvements with just minor modifications of the wiki systems being currently used.
Avgerinou, Maria & Pettersson, Rune How Multimedia Research Can Optimize the Design of Instructional Vodcasts World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [325]
Aybar, Hector; Juell, Paul & Shanmugasundaram, Vijayakumar Increased Flexablity in Display of Course Content World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [326]
Baeza-Yates, R. Mining the Web 2.0 to improve search 2009 Latin American Web Congress. LA-WEB 2009, 9-11 Nov. 2009 Piscataway, NJ, USA} 2009 [327] There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show several applications of mining the wisdom of crowds behind UGC} to improve search. These results not only impact the search performance but also the user interface, suggesting new ways of interaction. We will show live demos that find relations in the Wikipedia or improve image search, already available at sandbox.yahoo.com, the demo site of Yahoo! Research. Our final goal is to produce a virtuous data feedback circuit to leverage the Web itself.
Baeza-Yates, Ricardo User generated content: how good is it? Proceedings of the 3rd workshop on Information credibility on the web 2009 [328] User Generated Content (UGC) is one of the main current trends in the Web. This trend has allowed all people that can access the Internet to publish content in different media, such as text (e.g. blogs), photos or video. This data can be crucial for many applications, in particular for semantic search. It is early to say which impact UGC} will have and to what extent. However, the impact will be clearly related to the quality of this content. Hence, how good is the content that people generate in the so called Web 2.0? Clearly is not as good as editorial content in the Web site of a publisher. However, histories of success such as the case of the Wikipedia, show that it can be quite good. In addition, the quality gap is balanced by volume, as user generated content is much larger than, say, editorial content. In fact, Ramakrishnan and Tomkins estimate that UGC} generates daily from 8 to {10GB} while the professional Web only generates {2GB} in the same time. How we can estimate the quality of UGC?} One possibility is to directly evaluate the quality, but that is not easy as depends on the type of content and the availability of human judgments. One example of such approach is the study of Yahoo! Answers done by Agichtein et al. In this work they start from a judged question/answer collection where good questions usually have good answers. Then they predict good questions and good answers, obtaining an AUC} (area under the curve of the precision-recall graph) of 0.76 and 0.88, respectively. A second possibility is obtaining indirect evidence of the quality. For example, use UGC} for a given task and then evaluate the quality of the task results. One such example is the extraction of semantic relations done by Baeza-Yates} and Tiberi. To evaluate the quality of the results they used the Open Directory Project (ODP), showing that the results had a precision of over 60\%. For the cases that were not found in the ODP, a manually verified sample showed that the real precision was close to 100\%. What happened was that the ODP} was not specific enough to contain very specific relations, and every day the problem gets worse as we have more data. This example shows the quality of ODP} as well as the semantic encoded in queries. Notice that we can define queries as implicit UGC, because each query can be considered an implicit tag to Web pages that are clicked for that query, and hence we have an implicit folksonomy. A final alternative is crossing different UGC} sources and infer from there the quality of those sources. An example of this case, is the work by Van Zwol et al. where they use collective knowledge (wisdom of crowds) to extend image tags, and prove that almost 70\% of the tags can be semantically classified by using Wordnet and Wikipedia. This exposes the quality of both Flickr tags and Wikipedia. Our main motivation, is that by being able to generate semantic resources automatically from the Web (and in particular the Web 2.0), even with noise, coupling that with open content resources, we can create a virtuous feedback circuit. In fact, explicit and implicit folksonomies can be used to do supervised machine learning without the need of manual intervention (or at least drastically reduce it) to improve semantic tagging. After that, we can feedback the results on itself, and repeat the process. Using the right conditions, every iteration should improve the output, obtaining a virtuous cycle. As a side effect, we can also improve Web search, our main goal.
Baker, Peter; Xiao, Yun & Kidd, Jennifer Digital Natives and Digital Immigrants: A Comparison across Course Tasks and Delivery Methodologies Society for Information Technology \& Teacher Education International Conference 2010 [329]
Bakker, A.; Petrocco, R.; Dale, M.; Gerber, J.; Grishchenko, V.; Rabaioli, D. & Pouwelse, J. Online Video Using BitTorrent And HTML5 Applied To Wikipedia 2010 IEEE Tenth International Conference on Peer-to-Peer Computing (P2P 2010), 25-27 Aug. 2010 Piscataway, NJ, USA 2010 [330] Wikipedia started a project in order to enable users to add video and audio on their Wiki pages. The technical downside of this is that its bandwidth requirements will increase manifold. BitTorrent-based} peer-to-peer technology from P2P-Next} (a European research project) is explored to handle this bandwidth surge. We discuss the impact on the BitTorrent} piece picker and outline our tribe protocol for seamless integration of P2P} into the {HTML5} video and audio elements. Ongoing work on libswift which uses UDP, an enhanced transport protocol and integrated NAT/Firewall} puncturing, is also described.
Balasuriya, Dominic; Ringland, Nicky; Nothman, Joel; Murphy, Tara & Curran, James R. Named entity recognition in Wikipedia Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources 2009 [331] Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these resources have only been evaluated on newswire corpora or themselves. We present the first NER} evaluation on a Wikipedia gold standard (WG) corpus. Our analysis of cross-corpus performance on WG} shows that Wikipedia text may be a harder NER} domain than newswire. We find that an automatic annotation of Wikipedia has high agreement with WG} and, when used as training data, outperforms newswire models by up to 7.7\%.
Balmin, Andrey & Curtmola, Emiran WikiAnalytics: disambiguation of keyword search results on highly heterogeneous structured data WebDB '10 Procceedings of the 13th International Workshop on the Web and Databases 2010 [332] Wikipedia infoboxes is an example of a seemingly structured, yet extraordinarily heterogenous dataset, where any given record has only a tiny fraction of all possible fields. Such data cannot be queried using traditional means without a massive a priori integration effort, since even for a simple request the result values span many record types and fields. On the other hand, the solutions based on keyword search are too imprecise to capture user's intent. To address these limitations, we propose a system, referred to herein as WikiAnalytics, that utilizes a novel search paradigm in order to derive tables of precise and complete results from Wikipedia infobox records. The user starts with a keyword search query that finds a superset of the result records, and then browses clusters of records deciding which are and are not relevant. WikiAnalytics} uses three categories of clustering features based on record types, fields, and values that matched the query keywords, respectively. Since the system cannot predict which combination of features will be important to the user, it efficiently generates all possible clusters of records by all sets of features. We utilize a novel data structure, universal navigational lattice (UNL), that compactly encodes all possible clusters. WikiAnalytics} provides a dynamic and intuitive interface that lets the user explore the UNL} and construct homogeneous structured tables, which can be further queried and aggregated using the conventional tools.
Balog-Crisan, Radu; Roxin, Ioan & Smeureanu, Ion e-Learning platforms for Semantic Web World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [333]
Banek, M.; Juric, D. & Skocir, Z. Learning Semantic N-ary Relations from Wikipedia Database and Expert Systems Applications. 21st International Conference, DEXA 2010, 30 Aug.-3 Sept. 2010 Berlin, Germany 2010 [334] Automated construction of ontologies from text corpora, which saves both time and human effort, is a principal condition for realizing the idea of the Semantic Web. However, the recently proposed automated techniques are still limited in the scope of context that can be captured. Moreover, the source corpora generally lack the consensus of ontology users regarding the understanding and interpretation of ontology concepts. In this paper we introduce an unsupervised method for learning domain n-ary relations from Wikipedia articles, thus harvesting the consensus reached by the largest world community engaged in collecting and classifying knowledge. Providing ontologies with n-ary relations instead of the standard binary relations built on the subject-verb-object paradigm results in preserving the initial context of time, space, cause, reason or quantity that otherwise would be lost irreversibly. Our preliminary experiments with a prototype software tool show highly satisfactory results when extracting ternary and quaternary relations, as well as the traditional binary ones.
Barker, Philip Using Wikis and Weblogs to Enhance Human Performance World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [335]
Barker, Philip Using Wikis for Knowledge Management World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [336]
Baron, Georges-Louis & Bruillard, Eric New learners, Teaching Practices and Teacher Education: Which Synergies? The French case Society for Information Technology \& Teacher Education International Conference 2008 [337]
Bart, Thurber & Pope, Jack The Humanities in the Learning Space World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [338]
Basiel, Anthony Skip The media literacy spectrum: shifting pedagogic design World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [339]
Basile, Anthony & Murphy, John The Path to Open Source in Course Management Systems Used in Distance Education Programs Society for Information Technology \& Teacher Education International Conference 2010 [340]
Basile, Pierpaolo & Semeraro, Giovanni UBA: Using automatic translation and Wikipedia for cross-lingual lexical substitution Proceedings of the 5th International Workshop on Semantic Evaluation 2010 [341] This paper presents the participation of the University of Bari (UBA) at the SemEval-2010} Cross-Lingual} Lexical Substitution Task. The goal of the task is to substitute a word in a language Ls, which occurs in a particular context, by providing the best synonyms in a different language Lt which fit in that context. This task has a strict relation with the task of automatic machine translation, but there are some differences: Cross-lingual lexical substitution targets one word at a time and the main goal is to find as many good translations as possible for the given target word. Moreover, there are some connections with Word Sense Disambiguation (WSD) algorithms. Indeed, understanding the meaning of the target word is necessary to find the best substitutions. An important aspect of this kind of task is the possibility of finding synonyms without using a particular sense inventory or a specific parallel corpus, thus allowing the participation of unsupervised approaches. UBA} proposes two systems: the former is based on an automatic translation system which exploits Google Translator, the latter is based on a parallel corpus approach which relies on Wikipedia in order to find the best substitutions.
Basili, Roberto; Bos, Johan & Copestake, Ann Proceedings of the 2008 Conference on Semantics in Text Processing 2008 [342] Thanks to both statistical approaches and finite state methods, natural language processing (NLP), particularly in the area of robust, open-domain text processing, has made considerable progress in the last couple of decades. It is probably fair to say that NLP} tools have reached satisfactory performance at the level of syntactic processing, be the output structures chunks, phrase structures, or dependency graphs. Therefore, the time seems ripe to extend the state-of-the-art and consider deep semantic processing as a serious task in wide-coverage NLP.} This is a step that normally requires syntactic parsing, as well as integrating named entity recognition, anaphora resolution, thematic role labelling and word sense disambiguation, and other lower levels of processing for which reasonably good methods have already been developed. The goal of the STEP} workshop is to provide a forum for anyone active in semantic processing of text to discuss innovative technologies, representation issues, inference techniques, prototype implementations, and real applications. The preferred processing targets are large quantities of texts---either specialised domains, or open domains such as newswire text, blogs, and wikipedia-like text. Implemented rather than theoretical work is emphasised in STEP.} Featuring in STEP} 2008 workshop is a shared task" on comparing semantic representations as output by state-of-the-art NLP} systems. Participants were asked to supply a (small) text before the workshop. The test data for the shared task is composed out of all the texts submitted by the participants allowing participants to "challenge" each other. The output of these systems will be judged on a number of aspects by a panel of experts in the field during the workshop."
Bataineh, Emad & Abbar, Hend Al New Mobile-based Electronic Grade Management System World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [343]
Batista, Carlos Eduardo C. F. & Schwabe, Daniel LinkedTube: semantic information on web media objects Proceedings of the XV Brazilian Symposium on Multimedia and the Web 2009 [344] LinkedTube} is a service to create semantic and non-semantic relationships between videos available on services on the Internet (such as YouTube) and external elements (such as Wikipedia, Internet Movie Database, DBPedia, etc). The relationships are defined based on semantic entities obtained through an analysis of textual elements related to the video - its metadata, tags, user comments and external related content (such as sites linking to the video). The set of data comprising the extracted entities and the video metadata are used to define semantic relations between the video and the semantic entities from the Linked Data Cloud. Those relationships are defined using a vocabulary extended from MOWL, based on an extensible set of rules of analysis of the video's related content.
Battye, Greg Turning the ship around while changing horses in mid-stream: Building a University-wide framework for Online and Blended Learning at the University of Canberra World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [345]
Baytiyeh, Hoda & Pfaffman, Jay Why be a Wikipedian Proceedings of the 9th international conference on Computer supported collaborative learning - Volume 1 2009 [346] Wikipedia is a user-edited encyclopedia. Unpaid users contribute articles, edit them, and have heated debates about what information should be included or excluded. This study is designed to learn more about why people are willing to do this work without any fiscal compensation. Wikipedia administrators (n=115) completed an online survey with Likert-scaled items of potential types of satisfaction derived from participation as well as comments that were used to check the validity of the Likert-scaled items and allow participants to say in their own words why they were Wikipedian. Results showed that contributors in Wikipedia are driven largely by motivations to learn and create.
Bechet, F. & Charton, E. Unsupervised knowledge acquisition for extracting named entities from speech 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010, 14-19 March 2010 Dallas, TX, USA} 2010 [347] This paper presents a Named Entity Recognition (NER) method dedicated to process speech transcriptions. The main principle behind this method is to collect in an unsupervised way lexical knowledge for all entries in the ASR} lexicon. This knowledge is gathered with two methods: by automatically extracting NEs} on a very large set of textual corpora and by exploiting directly the structure contained in the Wikipedia resource. This lexical knowledge is used to update the statistical models of our NER} module based on a mixed approach with generative models (Hidden} Markov Models - {HMM) and discriminative models (Conditional} Random Field - CRF).} This approach has been evaluated within the French ESTER} 2 evaluation program and obtained the best results at the NER} task on ASR} transcripts.
Becker, Katrin Teaching Teachers about Serious Games World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [348]
Belz, Anja; Kow, Eric & Viethen, Jette The GREC named entity generation challenge 2009: overview and evaluation results Proceedings of the 2009 Workshop on Language Generation and Summarisation 2009 [349] The GREC-NEG} Task at Generation Challenges 2009 required participating systems to select coreference chains for all people entities mentioned in short encyclopaedic texts about people collected from Wikipedia. Three teams submitted six systems in total, and we additionally created four baseline systems. Systems were tested automatically using a range of existing intrinsic metrics. We also evaluated systems extrinsically by applying coreference resolution tools to the outputs and measuring the success of the tools. In addition, systems were tested in an intrinsic evaluation involving human judges. This report describes the GREC-NEG} Task and the evaluation methods applied, gives brief descriptions of the participating systems, and presents the evaluation results.
Belz, Anja; Kow, Eric; Viethen, Jette & Gatt, Albert The GREC challenge: overview and evaluation results Proceedings of the Fifth International Natural Language Generation Conference 2008 [350] The GREC} Task at REG} '08 required participating systems to select coreference chains to the main subject of short encyclopaedic texts collected from Wikipedia. Three teams submitted a total of 6 systems, and we additionally created four baseline systems. Systems were tested automatically using a range of existing intrinsic metrics. We also evaluated systems extrinsically by applying coreference resolution tools to the outputs and measuring the success of the tools. In addition, systems were tested in a reading/comprehension experiment involving human subjects. This report describes the GREC} Task and the evaluation methods, gives brief descriptions of the participating systems, and presents the evaluation results.
Bernardis, Daniela Education and Pervasive Computing. Didactical Use of the Mobile Phone: Create and Share Information Concerning Artistic Heritages and the Environment. World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [351]
Bhattacharya, Madhumita & Dron, Jon Mining Collective Intelligence for Creativity and Innovation: A Research proposal World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [352]
Bjelland, Tor Kristian & Nordbotten, Svein A Best Practice Online Course Architect World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [353]
Black, Aprille Noe; Falls, Jane & Black, Aprille Noe The Use of Web 2.0 Tools for Collaboration and the Development of 21st Century Skills Society for Information Technology \& Teacher Education International Conference 2009 [354]
Blocher, Michael & Tu, Chih-Hsiung Utilizing a Wiki to Construct Knowledge Society for Information Technology \& Teacher Education International Conference 2008 [355]
Blok, Rasmus & Godsk, Mikkel Podcasts in Higher Education: What Students Want, What They Really Need, and How This Might be Supported World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [356]
Bocek, Thomas; Peric, Dalibor; Hecht, Fabio; Hausheer, David & Stiller, Burkhard PeerVote: A Decentralized Voting Mechanism for P2P Collaboration Systems Proceedings of the 3rd International Conference on Autonomous Infrastructure, Management and Security: Scalability of Networks and Services 2009 [357] Peer-to-peer (P2P) systems achieve scalability, fault tolerance, and load balancing with a low-cost infrastructure, characteristics from which collaboration systems, such as Wikipedia, can benefit. A major challenge in P2P} collaboration systems is to maintain article quality after each modification in the presence of malicious peers. A way of achieving this goal is to allow modifications to take effect only if a majority of previous editors approve the changes through voting. The absence of a central authority makes voting a challenge in P2P} systems. This paper proposes the fully decentralized voting mechanism PeerVote, which enables users to vote on modifications in articles in a P2P} collaboration system. Simulations and experiments show the scalability and robustness of PeerVote, even in the presence of malicious peers.
Bonk, Curtis The World is Open: How Web Technology Is Revolutionizing Education World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [358]
Bouma, Gosse; Duarte, Sergio & Islam, Zahurul Cross-lingual alignment and completion of Wikipedia templates Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies 2009 [359] For many languages, the size of Wikipedia is an order of magnitude smaller than the English Wikipedia. We present a method for cross-lingual alignment of template and infobox attributes in Wikipedia. The alignment is used to add and complete templates and infoboxes in one language with information derived from Wikipedia in another language. We show that alignment between English and Dutch Wikipedia is accurate and that the result can be used to expand the number of template attribute-value pairs in Dutch Wikipedia by 50\%. Furthermore, the alignment provides valuable information for normalization of template and attribute names and can be used to detect potential inconsistencies.
Bouma, G.; Fahmi, I.; Mur, J.; van Noord, G.; van der Plas, L. & Tiedemann, J. Using syntactic knowledge for QA* Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany 2007 We describe the system of the University of Groningen for the monolingual Dutch and multilingual English to Dutch QA} tasks. First, we give a brief outline of the architecture of our QA-system, which makes heavy use of syntactic information. Next, we describe the modules that were improved or developed especially for the CLEF} tasks, among others incorporation of syntactic knowledge in IR, incorporation of lexical equivalences and coreference resolution, and a baseline multilingual (English} to Dutch) QA} system, which uses a combination of Systran and Wikipedia (for term recognition and translation) for question translation. For non-list questions, 31\% (20\%) of the highest ranked answers returned by the monolingual (multilingual) system were correct.
Boyles, Michael; Frend, Chauney; Rogers, Jeff; William, Albert; Reagan, David & Wernert, Eric Leveraging Pre-Existing Resources at Institutions of Higher Education for K-12 STEM Engagement World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [360]
Bra, Paul De; Smits, David; van der Sluijs, Kees; Cristea, Alexandra; Hendrix, Maurice & Bra, Paul De GRAPPLE: Personalization and Adaptation in Learning Management Systems World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [361]
Brachman, Ron Emerging Sciences of the Internet: Some New Opportunities Proceedings of the 4th European conference on The Semantic Web: Research and Applications 2007 [362] Semantic Web technologies have started to make a difference in enterprise settings and have begun to creep into use in limited parts of the World Wide Web. As is common in overview articles, it is easy to imagine scenarios in which the Semantic Web could provide important infrastructure for activities across the broader Internet. Many of these seem to be focused on improvements to what is essentially a search function (e.g., list the prices of flat screen {HDTVs} larger than 40 inches with 1080p resolution at shops in the nearest town that are open until 8pm on Tuesday evenings" {Web) and such capabilities will surely be of use to future Internet users. However if one looks closely at the research agendas of some of the largest Internet companies it is not clear that the staples of SW} thinking will intersect the most important paths of the major broad-spectrum service providers. Some of the emerging trends in the research labs of key industry players indicate that SW} goals generally taken for granted may be less central than envisioned and that the biggest opportunities may come from some less obvious directions. Given the level of investment and the global reach of big players like Yahoo! and Google it would pay us to look more closely at some of their fundamental investigations."
Bradshaw, Daniele; Siko, Kari Lee; Hoffman, William; Talvitie-Siple, June; Fine, Bethann; Carano, Ken; Carlson, Lynne A.; Mixon, Natalie K; Rodriguez, Patricia; Sheffield, Caroline C.; Sullens-Mullican, Carey; Bolick, Cheryl & Berson, Michael J. The Use of Videoconferencing as a Medium for Collaboration of Experiences and Dialogue Among Graduate Students: A Case Study from Two Southeastern Universities Society for Information Technology \& Teacher Education International Conference 2006 [363]
Bristow, Paul The Digital Divide an age old question? World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [364]
Bruckman, Amy Social Support for Creativity and Learning Online Proceedings of the 2008 Second IEEE International Conference on Digital Game and Intelligent Toy Enhanced Learning 2008 [365] Access critical reviews of computing literature. Become a reviewer for Computing Reviews
Brunetti, Korey & Townsend, Lori Extreme (Class) Makeover: Engaging Information Literacy Students with Web 2.0 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [366]
Brunvand, Stein & Bouwman, Jeffrey The Math Boot Camp Wiki: Using a Wiki to Extend the Learning Beyond June Society for Information Technology \& Teacher Education International Conference 2009 [367]
Brusilovsky, Peter; Yudelson, Michael & Sosnovsky, Sergey Collaborative Paper Exchange World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2005 [368]
Bucur, Johanna Teacher and Student Support Services for eLearning in Higher Education World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [369]
Bulkowski, Aleksander; Nawarecki, Edward & Duda, Andrzej Peer-to-Peer Dissemination of Learning Objects for Creating Collaborative Learning Communities World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [370]
Bullock, Shawn The Challenge of Digital Technologies to Educational Reform World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [371]
Buriol, Luciana S.; Castillo, Carlos; Donato, Debora; Leonardi, Stefano & Millozzi, Stefano Temporal Analysis of the Wikigraph Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence 2006 [372] Wikipedia is an online encyclopedia, available in more than 100 languages and comprising over 1 million articles in its English version. If we consider each Wikipedia article as a node and each hyperlink between articles as an arc we have a Wikigraph"} a graph that represents the link structure of Wikipedia. The Wikigraph differs from other Web graphs studied in the literature by the fact that there are explicit timestamps associated with each node's events. This allows us to do a detailed analysis of the Wikipedia evolution over time. In the first part of this study we characterize this evolution in terms of users editions and articles; in the second part we depict the temporal evolution of several topological properties of the Wikigraph. The insights obtained from the Wikigraphs can be applied to large Web graphs from which the temporal data is usually not available."
Buscaldi, D. & Rosso, P. A bag-of-words based ranking method for the Wikipedia question answering task Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany 2007 This paper presents a simple approach to the Wikipedia question answering pilot task in CLEF} 2006. The approach ranks the snippets, retrieved using the Lucene search engine, by means of a similarity measure based on bags of words extracted from both the snippets and the articles in Wikipedia. Our participation was in the monolingual English and Spanish tasks. We obtained the best results in the Spanish one.
Buscaldi, Davide & Rosso, Paolo A comparison of methods for the automatic identification of locations in wikipedia Proceedings of the 4th ACM workshop on Geographical information retrieval 2007 [373] In this paper we compare two methods for the automatic identification of geographical articles in encyclopedic resources such as Wikipedia. The methods are a WordNet-based} method that uses a set of keywords related to geographical places, and a multinomial Naïve Bayes classificator, trained over a randomly selected subset of the English Wikipedia. This task may be included into the broader task of Named Entity classification, a well-known problem in the field of Natural Language Processing. The experiments were carried out considering both the full text of the articles and only the definition of the entity being described in the article. The obtained results show that the information contained in the page templates and the category labels is more useful than the text of the articles.
Butler, Janice W. & Butler, Janice W. A Whodunit in Two Acts: An Online Murder Mystery that Enhances Library and Internet Search Skills Society for Information Technology \& Teacher Education International Conference 2010 [374]
Butnariu, Cristina & Veale, Tony UCD-S1: a hybrid model for detecting semantic relations between noun pairs in text Proceedings of the 4th International Workshop on Semantic Evaluations 2007 [375] We describe a supervised learning approach to categorizing inter-noun relations, based on Support Vector Machines, that builds a different classifier for each of seven semantic relations. Each model uses the same learning strategy, while a simple voting procedure based on five trained discriminators with various blends of features determines the final categorization. The features that characterize each of the noun pairs are a blend of lexical-semantic categories extracted from WordNet} and several flavors of syntactic patterns extracted from various corpora, including Wikipedia and the WMTS} corpus.
Byron, Akilah The Use of Open Source to mitigate the costs of implementing E-Government in the Caribbean World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [376]
Bélisle, Claire Academic Use of Online Encyclopedias World Conference on Educational Multimedia, Hypermedia and Telecommunications 2005 [377]
la Calzada, Gabriel De & Dekhtyar, Alex On measuring the quality of Wikipedia articles Proceedings of the 4th workshop on Information credibility 2010 [378] This paper discusses an approach to modeling and measuring information quality of Wikipedia articles. The approach is based on the idea that the quality of Wikipedia articles with distinctly different profiles needs to be measured using different information quality models. We report on our initial study, which involved two categories of Wikipedia articles: stabilized" (those whose content has not undergone major changes for a significant period of time) and "controversial" (the articles which have undergone vandalism revert wars or whose content is subject to internal discussions between Wikipedia editors). We present simple information quality models and compare their performance on a subset of Wikipedia articles with the information quality evaluations provided by human users. Our experiment shows that using special-purpose models for information quality captures user sentiment about Wikipedia articles better than using a single model for both categories of articles."
Capuano, Nicola; Pierri, Anna; Colace, Francesco; Gaeta, Matteo & Mangione, Giuseppina Rita A mash-up authoring tool for e-learning based on pedagogical templates Proceedings of the first ACM international workshop on Multimedia technologies for distance learning 2009 [379] The purpose of this paper is twofold. On the one hand it aims at presenting the pedagogical template" methodology for the definition of didactic activities through the aggregation of atomic learning entities on the basis of pre-defined schemas. On the other hand it proposes a Web-based authoring tool to build learning resources applying a defined methodology. The authoring tool is inspired by mashing-up principles and allows the combination of local learning entities with learning entities coming from external sources belonging to Web 2.0 like Wikipedia Flickr YouTube} and SlideShare.} Eventually the results of a small-scale experimentation inside a University course purposed both to define a pedagogical template for "virtual scientific experiments" and to build and deploy learning resources applying such template are presented."
Carano, Kenneth; Keefer, Natalie & Berson, Michael Mobilizing Social Networking Technology to Empower a New Generation of Civic Activism Among Youth Society for Information Technology \& Teacher Education International Conference 2007 [380]
Cardoso, N. GikiCLEF Topics and Wikipedia Articles: Did They Blend? Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [381] This paper presents a post-hoc analysis on how the Wikipedia collections fared in providing answers and justifications to GikiCLEF} topics. Based on all solutions found by all GikiCLEF} participant systems, this paper measures how self-sufficient the particular Wikipedia collections were to provide answers and justifications for the topics, in order to better understand the recall limit that a GikiCLEF} system specialised in one single language has.
Cardoso, N.; Batista, D.; Lopez-Pellicer, F.J. & Silva, M.J. Where In The Wikipedia Is That Answer? The XLDB At The GikiCLEF 2009 Task Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [382] We developed a new semantic question analyser for a custom prototype assembled for participating in GikiCLEF} 2009, which processes grounded concepts derived from terms, and uses information extracted from knowledge bases to derive answers. We also evaluated a newly developed named-entity recognition module, based in Conditional Random Fields, and a new world geo-ontology, derived from Wikipedia, which is used in the geographic reasoning process.
Carter, B. Beyond Google: Improving learning outcomes through digital literacy International Association of School Librarianship. Selected Papers from the ... Annual Conference 2009 The internet is often students' first choice when researching school assignments; however students' online search strategies typically consist of a basic Google search and Wikipedia. The creation of library intranet pages providing a range of search tools and the teaching of customised information literacy lessons aim to better utilise library resources and improve students' research skills and learning outcomes. ""
Cataltepe, Z.; Turan, Y. & Kesgin, F. Turkish document classification using shorter roots 2007 15th IEEE Signal Processing and Communications Applications, 11-13 June 2007 Piscataway, NJ, USA} 2007 Stemming is one of commonly used pre-processing steps in document categorization. Especially when fast and accurate classification of a lot of documents is needed, it is important to have as small number of and as small length roots as possible. This would not only reduce the time it takes to train and test classifiers but also would reduce the storage requirements for each document. In this study, we analyze the performance of classifiers when the longest or shortest roots found by a stemmer are used. We also analyze the effect of using only the consonants in the roots. We use two document data sets, obtained from Milliyet newspaper and Wikipedia to analyze classification accuracy of classifiers when roots obtained under these four conditions are used. We also analyze the classification accuracy when only the first 4, 3 or 2 letters or consonants are used from the roots. Using smaller roots results in smaller number of TF-IDF} vectors. Especially for small sized TF-IDF} vectors, using only consonants in the roots gives better performance than using all letters in the roots.
Chan, Michael; fai Chan, Stephen Chi & ki Leung, Cane Wing Online Search Scope Reconstruction by Connectivity Inference Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence 2007 [383] To cope with the continuing growth of the web, improvements should be made to the current brute-force techniques commonly used by robot-driven search engines. We propose a model that strikes a balance between robot and directorybased search engines by expanding the search scope of conventional directories to automatically include related categories. Our model makes use of a knowledge-rich and wellstructured corpus to infer relationships between documents and topic categories. We show that the hyperlink structure of Wikipedia articles can be effectively exploited to identify relations among topic categories. Our experiments show the average recall rate and precision rate achieved are 91\% and between 85\% and 215\% of Google's respectively.
Chan, Peter & Dovchin, Tuul Evaluation Study of the Development of Multimedia Cases for Training Mongolian Medical Professionals World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [384]
Charles, Elizabeth S.; Lasry, Nathaniel & Whittaker, Chris Does scale matter: using different lenses to understand collaborative knowledge building Proceedings of the 9th International Conference of the Learning Sciences - Volume 2 2010 [385] Web-based environments for communicating, networking and sharing information, often referred to collectively as Web} 2.0 have become ubiquitous - e.g., Wikipedia, Facebook, Flickr, or YouTube.} Understanding how such technologies can promote participation, collaboration and co-construction of knowledge, and how such affordances could be used for educational purposes has become a focus of research in the Learning Science and CSCL} communities (e.g., Dohn, 2009; Greenhow et al., 2009). One important mechanism is self-organization, which includes the regulation of feedback loops and the flows of information and resources within an activity system (Holland, 1996). But the study of such mechanisms calls for new ways of thinking about the unit of analysis, and the development of analytic tools that allow us to move back and forth through levels of activity systems that are designed to promote learning. Here, we propose that content analysis can focus on the flows of resources (i.e., content knowledge, scientific artifacts, epistemic beliefs) in terms of how they are established and the factors affecting whether they are taken up by members of the community.
Charnitski, Christina W. & Harvey, Francis A. The Clash Between School and Corporate Reality Society for Information Technology \& Teacher Education International Conference 2008 [386]
Chen, Irene L. & Beebe, Ronald Assessing Students’ Wiki Projects: Alternatives and Implications World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [387]
Chen, Pearl; Wan, Peiwen & Son, Jung-Eun Web 2.0 and Education: Lessons from Teachers’ Perspectives World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [388]
Chen, Jing-Ying Resource-Oriented Computing: Towards a Univeral Virtual Workspace Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops - Volume 02 2007 [389] Emerging popular Web applications such as blogs and Wikipedia are transforming the Internet into a global collaborative environment where most people can participate and contribute. When resources created by and shared among people are not just content but also software artifacts, a much more accommodating, universal, and virtual workspace is foreseeable that can support people with diverse background and needs. To realize the goal, it requires not only necessary infrastructure support for resource deployment and composition, but also strategies and mechanisms to handle the implied complexity. We propose a service-oriented architecture in which arbitrary resources are associated with syntactical descriptors, called metaphors, based on which runtime services can be instantiated and managed. Furthermore, service composition can be achieved through syntactic metaphor composition. We demonstrate our approach via an E-Science} workbench that allows user to access and combine distributed computing and storage resources in a flexible manner.
Cheryl, Cheryl Seals; Zhang, Lei & Gilbert, Juan Human Centered Computing Lab Web Site Redesign Effort World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [390]
Chikhi, Nacim Fateh; Rothenburger, Bernard & Aussenac-Gilles, Nathalie A Comparison of Dimensionality Reduction Techniques for Web Structure Mining Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence 2007 [391] In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the web hyperlink connectivity. We apply and compare four DRTs, namely, Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA) and Random Projection (RP).} Experiments conducted on three datasets allow us to assert the following: NMF} outperforms PCA} and ICA} in terms of stability and interpretability of the discovered structures; the wellknown WebKb} dataset used in a large number of works about the analysis of the hyperlink connectivity seems to be not adapted for this task and we suggest rather to use the recent Wikipedia dataset which is better suited.
Chin, Alvin; Hotho, Andreas & Strohmaier, Markus Proceedings of the International Workshop on Modeling Social Media 2010 [392] In recent years, social media applications such as blogs, microblogs, wikis, news aggregation sites and social tagging systems have pervaded the web and have transformed the way people communicate and interact with each other online. In order to understand and effectively design social media systems, we need to develop models that are capable of reflecting their complex, multi-faceted socio-technological nature. While progress has been made in modeling particular aspects of selected social media applications (such as the architecture of weblog conversations, the evolution of wikipedia, or the mechanics of news propagation), other aspects are less understood.
Choi, Boreum; Alexander, Kira; Kraut, Robert E. & Levine, John M. Socialization tactics in wikipedia and their effects Proceedings of the 2010 ACM conference on Computer supported cooperative work 2010 [393] Socialization of newcomers is critical both for conventional groups. It helps groups perform effectively and the newcomers develop commitment. However, little empirical research has investigated the impact of specific socialization tactics on newcomers' commitment to online groups. We examined WikiProjects, subgroups in Wikipedia organized around working on common topics or tasks. In study 1, we identified the seven socialization tactics used most frequently: invitations to join, welcome messages, requests to work on project-related tasks, offers of assistance, positive feedback on a new member's work, constructive criticism, and personal-related comments. In study 2, we examined their impact on newcomers' commitment to the project. Whereas most newcomers contributed fewer edits over time, the declines were slowed or reversed for those socialized with welcome messages, assistance, and constructive criticism. In contrast, invitations led to steeper declines in edits. These results suggest that different socialization tactics play different roles in socializing new members in online groups compared to offline ones.
Chong, Ng & Yamamoto, Michihiro Using Many Wikis for Collaborative Writing World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [394]
Chou, Chen-Hsiung Multimedia in Higher Education of Tourism World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [395]
Choudhury, Monojit; Hassan, Samer; Mukherjee, Animesh & Muresan, Smaranda Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing 2009 [396] The last few years have shown a steady increase in applying graph-theoretic models to computational linguistics. In many NLP} applications, entities can be naturally represented as nodes in a graph and relations between them can be represented as edges. There have been extensive research showing that graph-based representations of linguistic units such as words, sentences and documents give rise to novel and efficient solutions in a variety of NLP} tasks, ranging from part-of-speech tagging, word sense disambiguation and parsing, to information extraction, semantic role labeling, summarization, and sentiment analysis. More recently, complex network theory, a popular modeling paradigm in statistical mechanics and physics of complex systems, was proven to be a promising tool in understanding the structure and dynamics of languages. Complex network based models have been applied to areas as diverse as language evolution, acquisition, historical linguistics, mining and analyzing the social networks of blogs and emails, link analysis and information retrieval, information extraction, and representation of the mental lexicon. In order to make this field of research more visible, this time the workshop incorporated a special theme on Cognitive and Social Dynamics of Languages in the framework of Complex Networks. Cognitive dynamics of languages include topics focused primarily on language acquisition, which can be extended to language change (historical linguistics) and language evolution as well. Since the latter phenomena are also governed by social factors, we can further classify them under social dynamics of languages. In addition, social dynamics of languages also include topics such as mining the social networks of blogs and emails. A collection of articles pertaining to this special theme will be compiled in a special issue of the Computer Speech and Language journal. This volume contains papers accepted for presentation at the TextGraphs-4} 2009 Workshop on Graph-Based} Methods for Natural Language Processing. The event took place on August 7, 2009, in Suntec, Singapore, immediately following ACL/IJCNLP} 2009, the Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. Being the fourth workshop on this topic, we were able to build on the success of the previous TextGraphs} workshops, held as part of {HLT-NAACL} 2006, {HLT-NAACL} 2007 and Coling 2008. It aimed at bringing together researchers working on problems related to the use of graph-based algorithms for NLP} and on pure graph-theoretic methods, as well as those applying complex networks for explaining language dynamics. Like last year, TextGraphs-4} has also been endorsed by SIGLEX.} We issued calls for both regular and short papers. Nine regular and three short papers were accepted for presentation, based on the careful reviews of our program committee. Our sincere thanks to all the program committee members for their thoughtful, high quality and elaborate reviews, especially considering our extremely tight time frame for reviewing. The papers appearing in this volume have surely benefited from their expert feedback. This year's workshop attracted papers employing graphs in a wide range of settings and we are therefore proud to present a very diverse program. We received quite a few papers on discovering semantic similarity through random walks. Daniel Ramage et al. explore random walk based methods to discover semantic similarity in texts, while Eric Yeh et al. attempt to discover semantic relatedness through random walks on the Wikipedia. Amec Herdagdelen et al. describes a method for measuring semantic relatedness with vector space models and random walks.
Choulat, Tracey Teacher Education and Internet Safety Society for Information Technology \& Teacher Education International Conference 2010 [397]
Clauson, Kevin A; Polen, Hyla H; Boulos, Maged N K & Dzenowagis, Joan H Accuracy and completeness of drug information in Wikipedia AMIA} ... Annual Symposium Proceedings / AMIA} Symposium. AMIA} Symposium 2008 [398] Web 2.0 technologies, where users participate in content production, are increasingly used as informational and educational resources. Wikipedia is frequently cited by students in the healthcare professions. This study compared the accuracy and completeness of drug information in Wikipedia to Medscape Drug Reference, a traditionally-edited resource. Wikipedia answered fewer questions [40.0\% vs. 82.5\%] (p{\textless}0.001) and was less complete (p=0.00076) than Medscape. No gross errors were found in Wikipedia and its content has improved over time.
Clow, Doug Resource Discovery: Heavy and Light Metadata Approaches World Conference on Educational Multimedia, Hypermedia and Telecommunications 2004 [399]
Colazzo, Luigi; Magagnino, Francesco; Molinari, Andrea & Villa, Nicola From e-learning to Social Networking: a Case Study World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [400]
Cook, John Generating New Learning Contexts: Novel Forms of Reuse and Learning on the Move World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [401]
Copeland, Nancy & Bednar, Anne Mobilizing Educational Technologists in a Collaborative Online Community to Develop a Knowledge Management System as a Wiki Society for Information Technology \& Teacher Education International Conference 2010 [402]
Corbeil, Joseph Rene & Valdes-Corbeil, Maria Elena Enhance Your Online Courses by Re-Engineering The Courseware Management System Society for Information Technology \& Teacher Education International Conference 2008 [403]
Cosley, Dan; Frankowski, Dan; Terveen, Loren & Riedl, John Using intelligent task routing and contribution review to help communities build artifacts of lasting value Proceedings of the SIGCHI conference on Human Factors in computing systems 2006 [404] Many online communities are emerging that, like Wikipedia, bring people together to build community-maintained artifacts of lasting value (CALVs).} Motivating people to contribute is a key problem because the quantity and quality of contributions ultimately determine a CALV's} value. We pose two related research questions: 1) How does intelligent task routing---matching people with work---affect the quantity of contributions? 2) How does reviewing contributions before accepting them affect the quality of contributions? A field experiment with 197 contributors shows that simple, intelligent task routing algorithms have large effects. We also model the effect of reviewing contributions on the value of CALVs.} The model predicts, and experimental data shows, that value grows more slowly with review before acceptance. It also predicts, surprisingly, that a CALV} will reach the same final value whether contributions are reviewed before or after they are made available to the community.
Costa, Luís Fernando Using answer retrieval patterns to answer Portuguese questions Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access 2008 [405] Esfinge is a general domain Portuguese question answering system which has been participating at QA@CLEF} since 2004. It uses the information available in the official" document collections used in QA@CLEF} (newspaper text and Wikipedia) and information from the Web as an additional resource when searching for answers. Where it regards the use of external tools Esfinge uses a syntactic analyzer a morphological analyzer and a named entity recognizer. This year an alternative approach to retrieve answers was tested: whereas in previous years search patterns were used to retrieve relevant documents this year a new type of search patterns was also used to extract the answers themselves. We also evaluated the second and third best answers returned by Esfinge. This evaluation showed that when Esfinge answers correctly a question it does so usually with its first answer. Furthermore the experiments revealed that the answer retrieval patterns created for this participation improve the results but only for definition questions."
Coursey, Kino & Mihalcea, Rada Topic identification using Wikipedia graph centrality Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers 2009 [406] This paper presents a method for automatic topic identification using a graph-centrality algorithm applied to an encyclopedic graph derived from Wikipedia. When tested on a data set with manually assigned topics, the system is found to significantly improve over a simpler baseline that does not make use of the external encyclopedic knowledge.
Coutinho, Clara Using Blogs, Podcasts and Google Sites as Educational Tools in a Teacher Education Program World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [407]
Coutinho, Clara Web 2.0 technologies as cognitive tools: preparing future k-12 teachers Society for Information Technology \& Teacher Education International Conference 2009 [408]
Coutinho, Clara & Junior, João Bottentuit Using social bookmarking to enhance cooperation/collaboration in a Teacher Education Program World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [409]
Coutinho, Clara & Junior, João Batista Bottentuit Web 2.0 in Portuguese Academic Community: An Exploratory Survey Society for Information Technology \& Teacher Education International Conference 2008 [410]
Coutinho, Clara & Rocha, Aurora Screencast and Vodcast: An Experience in Secondary Education Society for Information Technology \& Teacher Education International Conference 2010 [411]
Crawford, Caroline; Smith, Richard A. & Smith, Marion S. Podcasting in the Learning Environment: From Podcasts for the Learning Community, Towards the Integration of Podcasts within the Elementary Learning Environment Society for Information Technology \& Teacher Education International Conference 2006 [412]
Crawford, Caroline M. & Thomson, Jennifer Graphic Novels as Visual Human Performance and Training Tools: Towards an Understanding of Information Literacy for Preservice Teachers Society for Information Technology \& Teacher Education International Conference 2007 [413]
Cui, Gaoying; Lu, Qin; Li, Wenjie & Chen, Yirong Mining Concepts from Wikipedia for Ontology Construction Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03 2009 [414] An ontology is a structured knowledgebase of concepts organized by relations among them. But concepts are usually mixed with their instances in the corpora for knowledge extraction. Concepts and their corresponding instances share similar features and are difficult to distinguish. In this paper, a novel approach is proposed to comprehensively obtain concepts with the help of definition sentences and Category Labels in Wikipedia pages. N-gram statistics and other NLP} knowledge are used to help extracting appropriate concepts. The proposed method identified nearly 50,000 concepts from about 700,000 Wiki pages. The precision reaching 78.5\% makes it an effective approach to mine concepts from Wikipedia for ontology construction.
Cummings, Jeff; Massey, Anne P. & Ramesh, V. Web 2.0 proclivity: understanding how personal use influences organizational adoption Proceedings of the 27th ACM international conference on Design of communication 2009 [415] Web 2.0 represents a major shift in how individuals communicate and collaborate with others. While many of these technologies have been used for public, social interactions (e.g., Wikipedia and YouTube), organizations are just beginning to explore their use in day-to-day operations. Due to relatively recent introduction and public popularity, Web 2.0 has led to a resurgent focus on how organizations can once again leverage technology within the organization for virtual and mass collaboration. In this paper, we explore some of the key questions facing organizations with regard to Web 2.0 implementation and adoption. We develop a model of Web} 2.0 Proclivity" defined as an individual's propensity to use Web 2.0 tools within the organization. Our model and set of associated hypotheses focuses on understanding an employee's internal Web 2.0 content behaviors based on non-work personal use behaviors. To test our model and hypotheses survey-based data was collected from a global engine design and manufacturing company. Our results show that Web 2.0 Proclivity is positively influenced by an employee's external behaviors and that differences exist across both functional departments and employee work roles. We discuss the research implications of our findings as well as how our findings and model of Web 2.0 Proclivity can be used to help guide organizational practice."
Cusinato, Alberto; Mea, Vincenzo Della; Salvatore, Francesco Di & Mizzaro, Stefano QuWi: quality control in Wikipedia Proceedings of the 3rd workshop on Information credibility on the web 2009 [416] We propose and evaluate QuWi} (Quality} in Wikipedia), a framework for quality control in Wikipedia. We build upon a previous proposal by Mizzaro [11], who proposed a method for substituting and/or complementing peer review in scholarly publishing. Since articles in Wikipedia are never finished, and their authors change continuously, we define a modified algorithm that takes into account the different domain, with particular attention to the fact that authors contribute identifiable pieces of information that can be further modified by other authors. The algorithm assigns quality scores to articles and contributors. The scores assigned to articles can be used, e.g., to let the reader understand how reliable are the articles he or she is looking at, or to help contributors in identifying low quality articles to be enhanced. The scores assigned to users measure the average quality of their contributions to Wikipedia and can be used, e.g., for conflict resolution policies based on the quality of involved users. Our proposed algorithm is experimentally evaluated by analyzing the obtained quality scores on articles for deletion and featured articles, also on six temporal Wikipedia snapshots. Preliminary results demonstrate that the proposed algorithm seems to appropriately identify high and low quality articles, and that high quality authors produce more long-lived contributions than low quality authors.
Cuthell, John & Preston, Christina Preston An interactivist e-community of practice using Web 2:00 tools Society for Information Technology \& Teacher Education International Conference 2007 [417]
Dale, Michael; Stern, Abram; Deckert, Mark & Sack, Warren System demonstration: Metavid.org: a social website and open archive of congressional video Proceedings of the 10th Annual International Conference on Digital Government Research: Social Networks: Making Connections between Citizens, Data and Government 2009 [418] We have developed Metavid.org, a site that archives video footage of the U.S.} Senate and House floor proceedings. Visitors can search for who said what when and also download, remix, blog, edit, discuss, and annotate transcripts and metadata. The site has been built with Open Source Software (OSS) and the video is archived in an OSS} codec (Ogg} Theora). We highlight two aspects of the Metavid design: (1) open standards; and, (2) Wiki functionality. First, open standards allow Metavid to function both as a platform, on top of which other sites can be built, and as a resource for mashing" (i.e. semi-automatically assembling custom websites). For example Voterwatch.org pulls its video from the Metavid archive. Second Metavid extends the MediaWiki} software (which is the foundation of Wikipedia) into the domain of collaborative video authoring. This extension allows closed-captioned text or video sequences to be collectively edited."
Dallman, Alicia & McDonald, Michael Upward Bound Success: Climbing the Collegiate Ladder with Web 2.0 Wikis Society for Information Technology \& Teacher Education International Conference 2010 [419]
Danyaro, K.U.; Jaafar, J.; Lara, R.A.A. De & Downe, A.G. An evaluation of the usage of Web 2.0 among tertiary level students in Malaysia 2010 International Symposium on Information Technology (ITSim 2010), 15-17 June 2010 Piscataway, NJ, USA} 2010 [420] Web 2.0 is increasingly becoming a familiar pedagogical tool in higher education, facilitating the process of teaching and learning. But this advancement in information technology has further provoked the problems like plagiarism and other academic misconduct. This paper evaluates the patterns of use and behavior of tertiary level students towards the use of Web 2.0 as an alternative and supplemental ELearning} Portal. A total of 92 students' data were collected and analyzed according to {'Self-Determination} Theory' (SDT).} It was found that students use social websites for chatting, gamming and sharing files. Facebook, YouTube} and Wikipedia are ranked as the most popular websites used by college students. It also reveals that students have an inherent desire of expressing ideas and opinion online openly and independently. This sense of freedom makes students feel more competent, autonomous or participative and find learning to be less tedious. Therefore, this report, recommends educators to adopt strategies for acknowledging students' feelings and activities online to reinforce positive behavior effective learning. Finally, we discussed the implications of Web 2.0 on education.
DeGennaro, Donna & Kress, Tricia Looking to Transform Learning: From Social Transformation in the Public Sphere to Authentic Learning in the Classroom World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [421]
Dehinbo, Johnson Strategy for progressing from in-house training into e-learning using Activity Theory at a South African university World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [422]
Dehinbo, Johnson Suitable research paradigms for social inclusion through enhancement of Web applications development in developing countries World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [423]
Desjardins, Francois & vanOostveen, Roland Collaborative Online Learning Environment:Towards a process driven approach and collective knowledge building World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [424]
Desmontils, E.; Jacquin, C. & Monceaux, L. Question types specification for the use of specialized patterns in Prodicos system Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany 2007 We present the second version of the Prodicos query answering system which was developed by the TALN} team from the LINA} institute. The main improvements made concern in the one hand, the use of external knowledge (Wikipedia) to improve the passage selection step. And on the other hand, the answer extraction step is improved by the determination of four different strategies for locating the answer to a question regarding its type. Afterwards, for the passage selection and answer extraction modules, the evaluation is put forward to justify the results obtained.
Dicheva, Darina & Dichev, Christo Helping Courseware Authors to Build Ontologies: The Case of TM4L Proceeding of the 2007 conference on Artificial Intelligence in Education: Building Technology Rich Learning Contexts That Work 2007 [425] The authors of topic map-based learning resources face major difficulties in constructing the underlying ontologies. In this paper we propose two approaches to address this problem. The first one is aimed at automatic construction of a “draft‿ topic map for the authors to start with. It is based on a set of heuristics for extracting semantic information from {HTML} documents and transforming it into a topic map format. The second one is aimed at providing help to authors during the topic map creating process by mining the Wikipedia knowledge base. It suggests “standard‿ names for the new topics (paired with URIs), along with lists of related topics in the considered domain. The proposed approaches are implemented in the educational topic maps editor TM4L.
Diem, Richard Technology and Culture: A Conceptual Framework Society for Information Technology \& Teacher Education International Conference 2007 [426]
Diplaris, S.; Kompatsiaris, I.; Flores, A.; Escriche, M.; Sigurbjornsson, B.; Garcia, L. & van Zwol, R. Collective Intelligence in Mobile Consumer Social Applications 2010 Ninth International Conference on Mobile Business \& 2010 Ninth Global Mobility Roundtable. ICMB-GMR 2010, 13-15 June 2010 Piscataway, NJ, USA} 2010 [427] This paper presents a mobile software application for the provision of mobile guidance, supporting functionalities, which are based on automatically extracted Collective Intelligence. Collective Intelligence is the intelligence which emerges from the collaboration, competition and coordination among individuals and can be extracted by the analysis of mass amount of user-contributed data currently available in Web 2.0 applications. More specifically, services including automatic Point of Interest (POI) detection, raking, search and aggregation with semi-structured sources (e.g. Wikipedia) are developed, which are based on lexical and statistical analysis of mass data coming from Wikipedia, Yahoo! Geoplanet, query logs and flickr tags. These services together with personalization functionalities are integrated in a travel mobile application, enabling their efficient usage exploiting on the same time user location information. Evaluation with real users depicts the application's potential for providing a higher degree of satisfaction compared to existing travel information management solutions and also directions for future enhancements.
Dixon, Brian Reflective Video Journals and Adolescent Metacognition: An exploratory study Society for Information Technology \& Teacher Education International Conference 2009 [428]
Dobrila, T.-A.; Diaconasu, M.-C.; Lungu, I.-D. & Iftene, A. Methods for Classifying Videos by Subject and Detecting Narrative Peak Points Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [429] 2009 marked UAIC's} first participation at the VideoCLEF} evaluation campaign. Our group built two separate systems for the Subject} Classification" and {"Affect} Detection" tasks. For the first task we created two resources starting from Wikipedia pages and pages identified with Google and used two tools for classification: Lucene and Weka. For the second task we extracted the audio component from a given video file using FFmpeg.} After that we computed the average amplitude for each word from the transcript by applying the Fast Fourier Transform algorithm in order to analyze the sound. A brief description of our systems' components is given in this paper."
Dodge, Bernie & Molebash, Philip Mini-Courses for Teaching with Technology: Thinking Outside the 3-Credit Box Society for Information Technology \& Teacher Education International Conference 2005 [430]
Dominik, Magda The Alternate Reality Game: Learning Situated in the Realities of the 21st Century World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [431]
Dondio, P.; Barrett, S.; Weber, S. & Seigneur, J.M. Extracting trust from domain analysis: a case study on the Wikipedia project Autonomic and Trusted Computing. Third International Conference, ATC 2006. Proceedings, 3-6 Sept. 2006 Berlin, Germany 2006 The problem of identifying trustworthy information on the World Wide Web is becoming increasingly acute as new tools such as wikis and blogs simplify and democratize publications. Wikipedia is the most extraordinary example of this phenomenon and, although a few mechanisms have been put in place to improve contributions quality, trust in Wikipedia content quality has been seriously questioned. We thought that a deeper understanding of what in general defines high-standard and expertise in domains related to Wikipedia - i.e. content quality in a collaborative environment - mapped onto Wikipedia elements would lead to a complete set of mechanisms to sustain trust in Wikipedia context. Our evaluation, conducted on about 8,000 articles representing 65\% of the overall Wikipedia editing activity, shows that the new trust evidence that we extracted from Wikipedia allows us to transparently and automatically compute trust values to isolate articles of great or low quality
Dopichaj, P. The university of Kaiserslautern at INEX 2006 Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany 2007 Digital libraries offer convenient access to large volumes of text, but finding the information that is relevant for a given information need is hard. The workshops of the Initiative for the Evaluation of XML} retrieval (INEX) provide a forum for testing the effectiveness of retrieval strategies. In this paper, we present the current version of our search engine that was used for INEX} 2006: Like at INEX} 2005, our search engine exploits structural patterns - in particular, automatic detection of titles - in the retrieval results to find the appropriate results among overlapping elements. This year, we examine how we can change this method to work better with the Wikipedia collection, which is significantly larger than the IEEE} collection used in previous years. We show that our optimizations both retain the retrieval quality and reduce retrieval time significantly.
Dormann, Claire & Biddle, Robert Urban expressions and experiential gaming World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [432]
Dornescu, I. Semantic QA for Encyclopaedic Questions: EQUAL in GikiCLEF Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [433] This paper presents a new question answering (QA) approach and a prototype system, EQUAL, which relies on structural information from Wikipedia to answer open-list questions. The system achieved the highest score amongst the participants in the GikiCLEF} 2009 task. Unlike the standard textual QA} approach, EQUAL} does not rely on identifying the answer within a text snippet by using keyword retrieval. Instead, it explores the Wikipedia page graph, extracting and aggregating information from multiple documents and enforcing semantic constraints. The challenges for such an approach and an error analysis are also discussed.
Dost, Ascander & King, Tracy Holloway Using large-scale parser output to guide grammar development Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks 2009 [434] This paper reports on guiding parser development by extracting information from output of a large-scale parser applied to Wikipedia documents. Data-driven parser improvement is especially important for applications where the corpus may differ from that originally used to develop the core grammar and where efficiency concerns affect whether a new construction should be added, or existing analyses modified. The large size of the corpus in question also brings scalability concerns to the foreground.
Doucet, A. & Lehtonen, M. Unsupervised classification of text-centric XML document collections Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany 2007 This paper addresses the problem of the unsupervised classification of text-centric XML} documents. In the context of the INEX} mining track 2006, we present methods to exploit the inherent structural information of XML} documents in the document clustering process. Using the k-means algorithm, we have experimented with a couple of feature sets, to discover that a promising direction is to use structural information as a preliminary means to detect and put aside structural outliers. The improvement of the semantic-wise quality of clustering is significantly higher through this approach than through a combination of the structural and textual feature sets. The paper also discusses the problem of the evaluation of XML} clustering. Currently, in the INEX} mining track, XML} clustering techniques are evaluated against semantic categories. We believe there is a mismatch between the task (to exploit the document structure) and the evaluation, which disregards structural aspects. An illustration of this fact is that, over all the clustering track submissions, our text-based runs obtained the 1st rank (Wikipedia} collection, out of 7) and 2nd rank (IEEE} collection, out of 13).
Dovchin, Tuul & Chan, Peter Multimedia Cases for Training Mongolian Medical Professionals -- An Innovative Strategy for Overcoming Pedagogical Challenges Society for Information Technology \& Teacher Education International Conference 2006 [435]
Dowling, Sherwood Adopting a Long Tail Web Publishing Strategy for Museum Educational Materials at the Smithsonian American Art Museum World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [436]
Dron, Jon & Anderson, Terry Collectives, Networks and Groups in Social Software for E-Learning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [437]
Dron, Jon & Bhattacharya, Madhumita A Dialogue on E-Learning and Diversity: the Learning Management System vs the Personal Learning Environment World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [438]
Désilets, Alain & Paquet, Sébastien Wiki as a Tool for Web-based Collaborative Story Telling in Primary School: a Case Study World Conference on Educational Multimedia, Hypermedia and Telecommunications 2005 [439]
Díaz, Francisco; Osorio, Maria & Amadeo, Ana Evolution of the use of Moodle in Argentina, adding Web2.0 features World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [440]
Ebner, Martin E-Learning 2.0 = e-Learning 1.0 + Web 2.0? Proceedings of the The Second International Conference on Availability, Reliability and Security 2007 [441] Access critical reviews of computing literature. Become a reviewer for Computing Reviews
Ebner, Martin & Nagler, Walther Has Web2.0 Reached the Educated Top? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [442]
Ebner, Martin & Taraghi, Behnam Personal Learning Environment for Higher Education – A First Prototype World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [443]
Els, Christo J. & Blignaut, A. Seugnet Exploring Teachers’ ICT Pedagogy in the North-West Province, South Africa World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [444]
Erlenkötter, Annekatrin; Kühnle, Claas-Michael; Miu, Huey-Ru; Sommer, Franziska & Reiners, Torsten Enhancing the Class Curriculum with Virtual World Use Cases for Production and Logistics World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [445]
van Erp, Marieke; Lendvai, Piroska & van den Bosch, Antal Comparing alternative data-driven ontological vistas of natural history Proceedings of the Eighth International Conference on Computational Semantics 2009 [446] Traditionally, domain ontologies are created manually, based on human experts' views on the classes and relations of the domain at hand. We present ongoing work on two approaches to the automatic construction of ontologies from a flat database of records, and compare them to a manually constructed ontology. The latter CIDOC-CRM} ontology focusses on the organisation of classes and relations. In contrast, the first automatic method, based on machine learning, focuses on the mutual predictiveness between classes, while the second automatic method, created with the aid of Wikipedia, stresses meaningful relations between classes. The three ontologies show little overlap; their differences illustrate that a different focus during ontology construction can lead to radically different ontologies. We discuss the implications of these differences, and argue that the two alternative ontologies may be useful in higher-level information systems such as search engines.
Erren, Patrick & Keil, Reinhard Enabling new Learning Scenarios in the Age of the Web 2.0 via Semantic Positioning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [447]
Every, Vanessa; Garcia, Gna & Young, Michael A Qualitative Study of Public Wiki Use in a Teacher Education Program Society for Information Technology \& Teacher Education International Conference 2010 [448]
Ewbank, Ann; Carter, Heather & Foulger, Teresa MySpace Dilemmas: Ethical Choices for Teachers using Social Networking Society for Information Technology \& Teacher Education International Conference 2008 [449]
Eymard, Oivier; Sanchis, Eric & Selves, Jean-Louis A Peer-to-Peer Collaborative Framework Based on Perceptive Reasoning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [450]
Farhoodi, M.; Yari, A. & Mahmoudi, M. Combining content-based and context-based methods for Persian web page classification 2009 Second International Conference on the Applications of Digital Information and Web Technologies (ICADIWT), 4-6 Aug. 2009 Piscataway, NJ, USA} 2009 [451] As the Internet includes millions of web pages for each and every search query, a fast retrieving of the desired and related information from the Web becomes very challenging subject. Automatic classification of web pages into relevant categories is an important and effective way to deal with the difficulty of retrieving information from the Internet. There are many automatic classification methods and algorithms that have been propose for content-based or context-based features of web pages. In this paper we analyze these features and try to exploit a combination of features to improve categorization accuracy of Persian web page classification. We conduct various experiments on a dataset consisting of 352 pages belonging to Persian Wikipedia, using content-based and context-based web page features. Our experiments demonstrate the usefulness of combining these features.
Farkas, Richárd; Szarvas, György & Ormándi, Róbert Improving a state-of-the-art named entity recognition system using the world wide web Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications 2007 [452] The development of highly accurate Named Entity Recognition (NER) systems can be beneficial to a wide range of Human Language Technology applications. In this paper we introduce three heuristics that exploit a variety of knowledge sources (the World Wide Web, Wikipedia and WordNet) and are capable of improving further a state-of-the-art multilingual and domain independent NER} system. Moreover we describe our investigations on entity recognition in simulated speech-to-text output. Our web-based heuristics attained a slight improvement over the best results published on a standard NER} task, and proved to be particularly effective in the speech-to-text scenario.
Farley, Alan & Barton, Siew Mee Developing and rewarding advanced teaching expertise in higher education - a different approach World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [453]
Feldmann, Birgit & Franzkowiak, Bettina Studying in Web 2.0 - What (Distance) Students Really Want World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [454]
Ferguson, Donald F. Autonomic business service management Proceedings of the 6th international conference on Autonomic computing 2009 [455] Medium and large enterprises think of information technology implementing business services. Examples include online banking or Web commerce. Most systems and application management technology manage individual hardware and software systems. A business service is inherently a composite comprised from multiple {HW, SW} and logical entities. For example, a Web commerce system may have a Web server, Web application server, database server and messaging system to connect to mainframe inventory management. Each of the systems has various installed software. Businesses want to automate management of the business service, not the individual instances. IT} management systems must manage the service, unwind" the high level policies and operations and apply them to individual {HW} and SW} elements. SOA} makes managing composites more difficult due to dynamic binding and request routing. This presentation describes the design and implementation of a business service management system. The core elements include: A Unified Service Model A real-time management database that extends the concept of a Configuration Management Database (CMDB) {[456]} and integrates external management and monitoring systems. Rule based event correlation and rule based discovery of the structure of a business service. Algorithmic analysis of the composite service to automatically detect and repair availability and end-to-end performance problems. The presentation suggests topics for additional research."
Ferres, D. & Rodriguez, H. TALP at GikiCLEF 2009 Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [457] This paper describes our experiments in Geographical Information Retrieval with the Wikipedia collection in the context of our participation in the GikiCLEF} 2009 Multilingual task in English and Spanish. Our system, called GikiTALP, follows a simple approach that uses standard Information Retrieval with the Sphinx full-text search engine and some Natural Language Processing techniques without Geographical Knowledge.
Ferrés, Daniel & Rodríguez, Horacio Experiments adapting an open-domain question answering system to the geographical domain using scope-based resources Proceedings of the Workshop on Multilingual Question Answering 2006 [458] This paper describes an approach to adapt an existing multilingual Open-Domain} Question Answering (ODQA) system for factoid questions to a Restricted Domain, the Geographical Domain. The adaptation of this ODQA} system involved the modification of some components of our system such as: Question Processing, Passage Retrieval and Answer Extraction. The new system uses external resources like GNS} Gazetteer for Named Entity (NE) Classification and Wikipedia or Google in order to obtain relevant documents for this domain. The system focuses on a Geographical Scope: given a region, or country, and a language we can semi-automatically obtain multilingual geographical resources (e.g. gazetteers, trigger words, groups of place names, etc.) of this scope. The system has been trained and evaluated for Spanish in the scope of the Spanish Geography. The evaluation reveals that the use of scope-based Geographical resources is a good approach to deal with multilingual Geographical Domain Question Answering.
Fiaidhi, Jinan & Mohammed, Sabah Detecting Some Collaborative Academic Indicators Based on Social Networks: A DBLP Case Study World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [459]
Filatova, Elena Directions for exploiting asymmetries in multilingual Wikipedia Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies 2009 [460] Multilingual Wikipedia has been used extensively for a variety Natural Language Processing (NLP) tasks. Many Wikipedia entries (people, locations, events, etc.) have descriptions in several languages. These descriptions, however, are not identical. On the contrary, descriptions in different languages created for the same Wikipedia entry can vary greatly in terms of description length and information choice. Keeping these peculiarities in mind is necessary while using multilingual Wikipedia as a corpus for training and testing NLP} applications. In this paper we present preliminary results on quantifying Wikipedia multilinguality. Our results support the observation about the substantial variation in descriptions of Wikipedia entries created in different languages. However, we believe that asymmetries in multilingual Wikipedia do not make Wikipedia an undesirable corpus for NLP} applications training. On the contrary, we outline research directions that can utilize multilingual Wikipedia asymmetries to bridge the communication gaps in multilingual societies.
Fleet, Gregory & Wallace, Peter How could Web 2.0 be shaping web-assisted learning? World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [461]
Flouris, G.; Fundulaki, I.; Pediaditis, P.; Theoharis, Y. & Christophides, V. Coloring RDF triples to capture provenance Semantic Web - ISWC 2009. 8th International Semantic Web Conference, ISWC 2009, 25-29 Oct. 2009 Berlin, Germany 2009 [462] Recently, the W3C} Linking Open Data effort has boosted the publication and inter-linkage of large amounts of RDF} datasets on the Semantic Web. Various ontologies and knowledge bases with millions of RDF} triples from Wikipedia and other sources, mostly in e-science, have been created and are publicly available. Recording provenance information of RDF} triples aggregated from different heterogeneous sources is crucial in order to effectively support trust mechanisms, digital rights and privacy policies. Managing provenance becomes even more important when we consider not only explicitly stated but also implicit triples (through RDFS} inference rules) in conjunction with declarative languages for querying and updating RDF} graphs. In this paper we rely on colored RDF} triples represented as quadruples to capture and manipulate explicit provenance information.
Fogarolli, Angela & Ronchetti, Marco A Web 2.0-enabled digital library World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [463]
Foley, Brian & Chang, Tae Wiki as a Professional Development Tool Society for Information Technology \& Teacher Education International Conference 2008 [464]
Forrester, Bruce & Verdon, John Introducing Peer Production into the Department of National Defense World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [465]
Forte, Andrea & Bruckman, Amy From Wikipedia to the classroom: exploring online publication and learning Proceedings of the 7th international conference on Learning sciences 2006 [466] Wikipedia represents an intriguing new publishing paradigm---can it be used to engage students in authentic collaborative writing activities? How can we design wiki publishing tools and curricula to support learning among student authors? We suggest that wiki publishing environments can create learning opportunities that address four dimensions of authenticity: personal, real world, disciplinary, and assessment. We have begun a series of design studies to investigate links between wiki publishing experiences and writing-to-learn. The results of an initial study in an undergraduate government course indicate that perceived audience plays an important role in helping students monitor the quality of writing; however, students' perception of audience on the Internet is not straightforward. This preliminary iteration resulted in several guidelines that are shaping efforts to design and implement new wiki publishing tools and curricula for students and teachers.
Francke, H. & Sundin, O. An inside view: credibility in Wikipedia from the perspective of editors Information Research 2010 Introduction. The question of credibility in participatory information environments, particularly Wikipedia, has been much debated. This paper investigates how editors on Swedish Wikipedia consider credibility when they edit and read Wikipedia articles. Method. The study builds on interviews with 11 editors on Swedish Wikipedia, supported by a document analysis of policies on Swedish Wikipedia. Analysis. The interview transcripts have been coded qualitatively according to the participants' use of Wikipedia and what they take into consideration in making credibility assessments. Results. The participants use Wikipedia for purposes where it is not vital that the information is correct. Their credibility assessments are mainly based on authorship, verifiability, and the editing history of an article. Conclusions. The situations and purposes for which the editors use Wikipedia are similar to other user groups, but they draw on their knowledge as members of the network of practice of wikipedians to make credibility assessments, including knowledge of certain editors and of the MediaWiki} architecture. Their assessments have more similarities to those used in traditional media than to assessments springing from the wisdom of crowds.
Freeman, Wendy Reflecting on the Culture of Research Using Weblogs Society for Information Technology \& Teacher Education International Conference 2006 [467]
Futrell-Schilling, Dawn Teaching and Learning in the Conceptual Age: Integrating a Sense of Symphony into the Curriculum Society for Information Technology \& Teacher Education International Conference 2009 [468]
Gagne, Claude & Fels, Deborah Learning through Weblogs World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [469]
Ganeshan, Kathiravelu A Technological Framework for Improving Education in the Developing World World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [470]
Ganeshan, Kathiravelu & Komosny, Dan Rojak: A New Paradigm in Teaching and Learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [471]
Ganjisaffar, Y.; Javanmardi, S. & Lopes, C. Review-based ranking of Wikipedia articles 2009 International Conference on Computational Aspects of Social Networks (CASON), 24-27 June 2009 Piscataway, NJ, USA} 2009 [472] Wikipedia, the largest encyclopedia on the Web, is often seen as the most successful example of crowdsourcing. The encyclopedic knowledge it accumulated over the years is so large that one often uses search engines, to find information in it. In contrast to regular Web pages, Wikipedia is fairly structured, and articles are usually accompanied with history pages, categories and talk pages. The meta-data available in these pages can be analyzed to gain a better understanding of the content and quality of the articles. We discuss how the rich meta-data available in wiki pages can be used to provide better search results in Wikipedia. Built on the studies on Wisdom} of Crowds" and the effectiveness of the knowledge collected by a large number of people we investigate the effect of incorporating the extent of review of an article in the quality of rankings of the search results. The extent of review is measured by the number of distinct editors contributed to the articles and is extracted by processing Wikipedia's history pages. We compare different ranking algorithms that explore combinations of text-relevancy PageRank} and extent of review. The results show that the review-based ranking algorithm which combines the extent of review and text-relevancy outperforms the rest; it is more accurate and less computationally expensive compared to PageRank-based} rankings."
Ganjisaffar, Yasser; Javanmardi, Sara & Lopes, Cristina Review-Based Ranking of Wikipedia Articles Proceedings of the 2009 International Conference on Computational Aspects of Social Networks 2009 [473] Wikipedia, the largest encyclopedia on the Web, is often seen as the most successful example of crowdsourcing. The encyclopedic knowledge it accumulated over the years is so large that one often uses search engines, to find information in it. In contrast to regular Web pages, Wikipedia is fairly structured, and articles are usually accompanied with history pages, categories and talk pages. The meta-data available in these pages can be analyzed to gain a better understanding of the content and quality of the articles. We discuss how the rich meta-data available in wiki pages can be used to provide better search results in Wikipedia. Built on the studies on Wisdom} of Crowds" and the effectiveness of the knowledge collected by a large number of people we investigate the effect of incorporating the extent of review of an article in the quality of rankings of the search results. The extent of review is measured by the number of distinct editors contributed to the articles and is extracted by processing Wikipedia's history pages. We compare different ranking algorithms that explore combinations of text-relevancy PageRank} and extent of review. The results show that the review-based ranking algorithm which combines the extent of review and text-relevancy outperforms the rest; it is more accurate and less computationally expensive compared to PageRank-based} rankings."
Gantner, Zeno & Schmidt-Thieme, Lars Automatic content-based categorization of Wikipedia articles Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources 2009 [474] Wikipedia's article contents and its category hierarchy are widely used to produce semantic resources which improve performance on tasks like text classification and keyword extraction. The reverse -- using text classification methods for predicting the categories of Wikipedia articles -- has attracted less attention so far. We propose to return the favor" and use text classifiers to improve Wikipedia. This could support the emergence of a virtuous circle between the wisdom of the crowds and machine Learning/NLP} methods. We define the categorization of Wikipedia articles as a multi-label classification task describe two solutions to the task and perform experiments that show that our approach is feasible despite the high number of labels."
Gaonkar, Shravan & Choudhury, Romit Roy Micro-Blog: map-casting from mobile phones to virtual sensor maps Proceedings of the 5th international conference on Embedded networked sensor systems 2007 [475] The synergy of phone sensors (microphone, camera, GPS, etc.), wireless capability, and ever-increasing device density can lead to novel people-centric applications. Unlike traditional sensor networks, the next generation networks may be participatory, interactive, and in the scale of human users. Millions of global data points can be organized on a visual platform, queried, and sophistically answered through human participation. Recent years have witnessed the isolated impacts of distributed knowledge sharing (Wikipedia), social networks, sensor networks, and mobile communication. We believe that significant more impact is latent in their convergence, that can to be drawn out through innovations in applications. This demonstration, called Micro-Blog, is a first step towards this goal.
Gardner, J.; Krowne, A. & Xiong, Li NNexus: towards an automatic linker for a massively-distributed collaborative corpus 2006 International Conference on Collaborative Computing: Networking, Applications and Worksharing, 17-20 Nov. 2006 Piscataway, NJ, USA} 2006 Collaborative online encyclopedias such as Wikipedia and PlanetMath} are becoming increasingly popular. In order to understand an article in a corpus a user must understand the related and underlying concepts through linked articles. In this paper, we introduce NNexus, a generalization of the automatic linking component of PlanetMath.org} and the first system that automates the process of linking encyclopedia entries into a semantic network of concepts. We discuss the challenges, present the conceptual models as well as specific mechanisms of NNexus} system, and discuss some of our ongoing and completed works
Garvoille, Alexa & Buckner, Ginny Writing Wikipedia Pages in the Constructivist Classroom World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [476]
Garza, S.E. & Brena, R.F. Graph local clustering for topic detection in Web collections 2009 Latin American Web Congress. LA-WEB 2009, 9-11 Nov. 2009 Piscataway, NJ, USA} 2009 [477] In the midst of a developing Web that increases its size with a constant rhythm, automatic document organization becomes important. One way to arrange documents is by categorizing them into topics. Even when there are different forms to consider topics and their extraction, a practical option is to view them as document groups and apply clustering algorithms. An attractive alternative that naturally copes with the Web size and complexity is the one proposed by graph local clustering (GLC) methods. In this paper, we define a formal framework for working with topics in hyperlinked environments and analyze the feasibility of GLC} for this task. We performed tests over an important Web collection, namely Wikipedia, and our results, which were validated using various kinds of methods (some of them specific for the information domain), indicate that this approach is suitable for topic discovery.
Geiger, R. Stuart & Ribes, David The work of sustaining order in wikipedia: the banning of a vandal Proceedings of the 2010 ACM conference on Computer supported cooperative work 2010 [478] In this paper, we examine the social roles of software tools in the English-language Wikipedia, specifically focusing on autonomous editing programs and assisted editing tools. This qualitative research builds on recent research in which we quantitatively demonstrate the growing prevalence of such software in recent years. Using trace ethnography, we show how these often-unofficial technologies have fundamentally transformed the nature of editing and administration in Wikipedia. Specifically, we analyze vandal fighting" as an epistemic process of distributed cognition highlighting the role of non-human actors in enabling a decentralized activity of collective intelligence. In all this case shows that software programs are used for more than enforcing policies and standards. These tools enable coordinated yet decentralized action independent of the specific norms currently in force."
Gentile, Anna Lisa; Basile, Pierpaolo; Iaquinta, Leo & Semeraro, Giovanni Lexical and Semantic Resources for NLP: From Words to Meanings Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III 2008 [479] A user expresses her information need through words with a precise meaning, but from the machine point of view this meaning does not come with the word. A further step is needful to automatically associate it to the words. Techniques that process human language are required and also linguistic and semantic knowledge, stored within distinct and heterogeneous resources, which play an important role during all Natural Language Processing (NLP) steps. Resources management is a challenging problem, together with the correct association between URIs} coming from the resources and meanings of the Words.This} work presents a service that, given a lexeme (an abstract unit of morphological analysis in linguistics, which roughly corresponds to a set of words that are different forms of the same word), returns all syntactic and semantic information collected from a list of lexical and semantic resources. The proposed strategy consists in merging data with origin from stable resources, such as WordNet, with data collected dynamically from evolving sources, such as the Web or Wikipedia. That strategy is implemented in a wrapper to a set of popular linguistic resources that provides a single point of access to them, in a transparent way to the user, to accomplish the computational linguistic problem of getting a rich set of linguistic and semantic annotations in a compact way.
Geraci, Michael Implementing a Wiki as a collaboration tool for group projects World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [480]
Ghislandi, Patrizia; Mattei, Antonio; Paolino, Daniela; Pellegrini, Alice & Pisanu, Francesco Designing Online Learning Communities for Higher Education: Possibilities and Limits of Moodle World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [481]
Gibson, David; Reynolds-Alpert, Suzanne; Doering, Aaron & Searson, Michael Participatory Media in Informal Learning Society for Information Technology \& Teacher Education International Conference 2009 [482]
Giza, Brian & McCann, Erin The Use of Free Translation Tools in the Biology Classroom Society for Information Technology \& Teacher Education International Conference 2007 [483]
Gleim, Rüdiger; Mehler, Alexander & Dehmer, Matthias Web corpus mining by instance of Wikipedia Proceedings of the 2nd International Workshop on Web as Corpus 2006 [484] In this paper we present an approach to structure learning in the area of web documents. This is done in order to approach the goal of webgenre tagging in the area of web corpus linguistics. A central outcome of the paper is that purely structure oriented approaches to web document classification provide an information gain which may be utilized in combined approaches of web content and structure analysis.
Gleim, R.; Mehler, A.; Dehmer, M. & Pustylnikov, O. Aisles through the category forest Third International Conference on Web information systems and technologies, WEBIST 2007, 3-6 March 2007 Setubal, Portugal 2007 The Word Wide Web is a continuous challenge to machine learning. Established approaches have to be enhanced and new methods be developed in order to tackle the problem of finding and organising relevant information. It has often been motivated that semantic classifications of input documents help solving this task. But while approaches of supervised text categorisation perform quite well on genres found in written text, newly evolved genres on the Web are much more demanding. In order to successfully develop approaches to Web mining, respective corpora are needed. However, the composition of genre- or domain-specific Web corpora is still an unsolved problem. It is time consuming to build large corpora of good quality because Web pages typically lack reliable meta information. Wikipedia along with similar approaches of collaborative text production offers a way out of this dilemma. We examine how social tagging, as supported by the MediaWiki} software, can be utilised as a source of corpus building. Further, we describe a representation format for social ontologies and present the Wikipedia category explorer, a tool which supports categorical views to browse through the Wikipedia and to construct domain specific corpora for machine learning.
Glogoff, Stuart Channeling Students and Parents: Promoting the University Through YouTube World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [485]
Glover, Ian & Oliver, Andrew Hybridisation of Social Networking and Learning Environments World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [486]
Glover, Ian; Xu, Zhijie & Hardaker, Glenn Redeveloping an eLearning Annotation System as a Web Service World Conference on Educational Multimedia, Hypermedia and Telecommunications 2005 [487]
Goh, Hui-Ngo & Kiu, Ching-Chieh Context-based term identification and extraction for ontology construction 2010 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE 2010), 21-23 Aug. 2010 Piscataway, NJ, USA} 2010 [488] Ontology construction often requires a domain specific corpus in conceptualizing the domain knowledge; specifically, it is an association of terms, relation between terms and related instances. It is a vital task to identify a list of significant term for constructing a practical ontology. In this paper, we present the use of a context-based term identification and extraction methodology for ontology construction from text document. The methodology is using a taxonomy and Wikipedia to support automatic term identification and extraction from structured documents with an assumption of candidate terms for a topic are often associated with its topic-specific keywords. A hierarchical relationship of super-topics and sub-topics is defined by a taxonomy, meanwhile, Wikipedia is used to provide context and background knowledge for topics that defined in the taxonomy to guide the term identification and extraction. The experimental results have shown the context-based term identification and extraction methodology is viable in defining topic concepts and its sub-concepts for constructing ontology. The experimental results have also proven its viability to be applied in a small corpus / text size environment in supporting ontology construction.
González-Martínez, MaríaDolores & Herrera-Batista, Miguel Angel Habits and preferences of University Students on the use of Information and Communication Technologies in their academic activities and of socialization Society for Information Technology \& Teacher Education International Conference 2009 [489]
Gool, Luc Van; Breitenstein, Michael D.; Gammeter, Stephan; Grabner, Helmut & Quack, Till Mining from large image sets Proceeding of the ACM International Conference on Image and Video Retrieval 2009 [490] So far, most image mining was based on interactive querying. Although such querying will remain important in the future, several applications need image mining at such wide scales that it has to run automatically. This adds an additional level to the problem, namely to apply appropriate further processing to different types of images, and to decide on such processing automatically as well. This paper touches on those issues in that we discuss the processing of landmark images and of images coming from webcams. The first part deals with the automated collection of images of landmarks, which are then also automatically annotated and enriched with Wikipedia information. The target application is that users photograph landmarks with their mobile phones or PDAs, and automatically get information about them. Similarly, users can get images in their photo albums annotated automatically. The object of interest can also be automatically delineated in the images. The pipeline we propose actually retrieves more images than manual keyword input would produce. The second part of the paper deals with an entirely different source of image data, but one that also produces massive amounts (although typically not archived): webcams. They produce images at a single location, but rather continuously and over extended periods of time. We propose an approach to summarize data coming from webcams. This data handling is quite different from that applied to the landmark images.
Gore, David; Lee, Marie & Wassus, Kenny New Possibilities with IT and Print Technologies: Variable Data Printing VDP Society for Information Technology \& Teacher Education International Conference 2010 [491]
Gray, Kathleen Originality and Plagiarism Resources for Academic Staff Development in the Era of New Web Authoring Formats World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [492]
Greenberg, Valerie & Carbajal, Darlene Using Convergent Media to Engage Graduate Students in a Digital and Electronic Writing class: Some Surprising Results World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [493]
Greene, M. Epidemiological Monitoring for Emerging Infectious Diseases Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense IX, 5-8 April 2010 USA} 2010 [494] The Homeland Security News Wire has been reporting on new ways to fight epidemics using digital tools such as IPhone, social networks, Wikipedia, and other Internet sites. Instant two-way communication now gives consumers the ability to complement official reports on emerging infectious diseases from health authorities. However, there is increasing concern that these communications networks could open the door to mass panic from unreliable or false reports. There is thus an urgent need to ensure that epidemiological monitoring for emerging infectious diseases gives health authorities the capability to identify, analyze, and report disease outbreaks in as timely and efficient a manner as possible. One of the dilemmas in the global dissemination of information on infectious diseases is the possibility that information overload will create inefficiencies as the volume of Internet-based surveillance information increases. What is needed is a filtering mechanism that will retrieve relevant information for further analysis by epidemiologists, laboratories, and other health organizations so they are not overwhelmed with irrelevant information and will be able to respond quickly. This paper introduces a self-organizing ontology that could be used as a filtering mechanism to increase relevance and allow rapid analysis of disease outbreaks as they evolve in real time.
Greenhow, Christine What Teacher Education Needs to Know about Web 2.0: Preparing New Teachers in the 21st Century Society for Information Technology \& Teacher Education International Conference 2007 [495]
Greenhow, Christine; Searson, Michael & Strudler, Neal FWIW: What the Research Says About Engaging the Web 2.0 Generation Society for Information Technology \& Teacher Education International Conference 2009 [496]
Guerrero, Shannon Web 2.0 in a Preservice Math Methods Course: Teacher Candidates’ Perceptions and Predictions Society for Information Technology \& Teacher Education International Conference 2010 [497]
Guetl, Christian Context-sensitive and Personalized Concept-based Access to Knowledge for Learning and Training Purposes World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [498]
Guo, Zinan & Greer, Jim Connecting E-portfolios and Learner Models World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2005 [499]
Gupta, Priyanka; Seals, Cheryl & Wilson, Dale-Marie Design And Evaluation of SimBuilder World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [500]
Gurevych, Iryna & Zesch, Torsten Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources 2009 [501] Welcome to the proceedings of the ACL} Workshop The} People's Web Meets NLP:} Collaboratively Constructed Semantic Resources". The workshop attracted 21 submissions of which 9 are included in these proceedings. We are gratified by this level of interest. This workshop was motivated by the observation that the NLP} community is currently considerably influenced by online resources which are collaboratively constructed by ordinary users on the Web. In many works such resources have been used as semantic resources overcoming the knowledge acquisition bottleneck and coverage problems pertinent to conventional lexical semantic resources. The resource that has gained the greatest popularity in this respect so far is Wikipedia. However the scope of the workshop deliberately exceeded Wikipedia. We are happy that the proceedings include papers on resources such as Wiktionary Mechanical Turk or creating semantic resources through online games. This encourages us in our belief that collaboratively constructed semantic resources are of growing interest for the natural language processing community. We should also add that we hoped to bring together researchers from both worlds: those using collaboratively created resources in NLP} applications and those using NLP} applications for improving the resources or extracting different types of semantic information from them. This is also reflected in the proceedings although the stronger interest was taken in using semantic resources for NLP} applications."
Guru, D. S.; Harish, B. S. & Manjunath, S. Symbolic representation of text documents Proceedings of the Third Annual ACM Bangalore Conference 2010 [502] This paper presents a novel method of representing a text document by the use of interval valued symbolic features. A method of classification of text documents based on the proposed representation is also presented. The newly proposed model significantly reduces the dimension of feature vectors and also the time taken to classify a given document. Further, extensive experimentations are conducted on vehicles-wikipedia datasets to evaluate the performance of the proposed model. The experimental results reveal that the obtained results are on par with the existing results for vehicles-wikipedia dataset. However, the advantage of the proposed model is that it takes relatively a less time for classification as it is based on a simple matching strategy.
Gyarmati, A. & Jones, G.J.F. When to Cross Over? Cross-Language Linking Using Wikipedia for VideoCLEF 2009 Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [503] We describe Dublin City University (DCU)'s} participation in the VideoCLEF} 2009 Linking Task. Two approaches were implemented using the Lemur information retrieval toolkit. Both approaches first extracted a search query from the transcriptions of the Dutch TV} broadcasts. One method first performed search on a Dutch Wikipedia archive, then followed links to corresponding pages in the English Wikipedia. The other method first translated the extracted query using machine translation and then searched the English Wikipedia collection directly. We found that using the original Dutch transcription query for searching the Dutch Wikipedia yielded better results.
Hamilton, Margaret & Howell, Sheila Technology Options for Assessment Purposes and Quality Graduate Outcomes World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [504]
Hammond, Thomas; Friedman, Adam; Keeler, Christy; Manfra, Meghan & Metan, Demet Epistemology is elementary: Historical thinking as applied epistemology in an elementary social studies methods class Society for Information Technology \& Teacher Education International Conference 2008 [505]
Haridas, M. & Caragea, D. Exploring Wikipedia and DMoz as Knowledge Bases for Engineering a User Interests Hierarchy for Social Network Applications On the Move to Meaningful Internet Systems: OTM 2009. Confederated International Conferences CoopIS, DOA, IS, and ODBASE 2009, 1-6 Nov. 2009 Berlin, Germany 2009 [506] The outgrowth of social networks in the recent years has resulted in opportunities for interesting data mining problems, such as interest or friendship recommendations. A global ontology over the interests specified by the users of a social network is essential for accurate recommendations. We propose, evaluate and compare three approaches to engineering a hierarchical ontology over user interests. The proposed approaches make use of two popular knowledge bases, Wikipedia and Directory Mozilla, to extract interest definitions and/or relationships between interests. More precisely, the first approach uses Wikipedia to find interest definitions, the latent semantic analysis technique to measure the similarity between interests based on their definitions, and an agglomerative clustering algorithm to group similar interests into higher level concepts. The second approach uses the Wikipedia Category Graph to extract relationships between interests, while the third approach uses Directory Mozilla to extract relationships between interests. Our results show that the third approach, although the simplest, is the most effective for building a hierarchy over user interests.
Harman, D.; Kando, N.; Lalmas, M. & Peters, C. The Four Ladies of Experimental Evaluation Multilingual and Multimodal Information Access Evaluation. International Conference of the Cross-Language Evaluation Forum, CLEF 2010, 20-23 Sept. 2010 Berlin, Germany 2010 [507] The goal of the panel is to present some of the main lessons that we have learned in well over a decade of experimental evaluation and to promote discussion with respect to what the future objectives in this field should Be.TREC} was started in 1992 in conjunction with the building of a new 2 GB} test collection for the DARPA} TIPSTER} project. Whereas the main task in the early TRECs} was the adhoc retrieval task in English, many other tasks such as question-answering, web retrieval, and retrieval within specific domain have been tried over the years. NTCIR, the Asian version of TREC, started in 1999 and has run in an 18-months cycle. Whereas NTCIR} is similar to TREC, there has always been a tighter connection to the NLP} community, allowing for some unique tracks. Additionally NTCIR} has done extensive pioneering work with patents, including searching, classification, and translation. The coordination of the European CLIR} task moved from TREC} to Europe in 2000 and CLEF} (Cross-Language} Information Forum) was launched. The objective was to expand the European CLIR} effort by including more languages and more tasks, and by encouraging more participation from Europe. The INitiative} for the Evaluation of XML} retrieval (INEX) started in 2002 to provide evaluation of structured document retrieval, in particular to investigate the retrieval of document components that are XML} elements of varying granularity. The initiative used 12,107 full-text scientific articles from 18 IEEE} Computer Society publications, with each article containing 1,532 XML} nodes on average. The collection grew to 16,819 articles in 2005 and moved on to using Wikipedia articles starting in 2006.
Hartrumpf, S.; Bruck, T. Vor Der & Eichhorn, C. Detecting duplicates with shallow and parser-based methods 2010 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE 2010), 21-23 Aug. 2010 Piscataway, NJ, USA} 2010 [508] Identifying duplicate texts is important in many areas like plagiarism detection, information retrieval, text summarization, and question answering. Current approaches are mostly surface-oriented (or use only shallow syntactic representations) and see each text only as a token list. In this work however, we describe a deep, semantically oriented method based on semantic networks which are derived by a syntactico-semantic parser. Semantically identical or similar semantic networks for each sentence of a given base text are efficiently retrieved by using a specialized semantic network index. In order to detect many kinds of paraphrases the current base semantic network is varied by applying inferences: lexico-semantic relations, relation axioms, and meaning postulates. Some important phenomena occurring in difficult-to-detect duplicates are discussed. The deep approach profits from background knowledge, whose acquisition from corpora like Wikipedia is explained briefly. This deep duplicate recognizer is combined with two shallow duplicate recognizers in order to guarantee high recall for texts which are not fully parsable. The evaluation shows that the combined approach preserves recall and increases precision considerably, in comparison to traditional shallow methods. For the evaluation, a standard corpus of German plagiarisms was extended by four diverse components with an emphasis on duplicates (and not just plagiarisms), e.g., news feed articles from different web sources and two translations of the same short story.
Hartrumpf, S. & Leveling, J. Recursive Question Decomposition for Answering Complex Geographic Questions Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [509] This paper describes the GIRSA-WP} system and the experiments performed for GikiCLEF} 2009, the geographic information retrieval task in the question answering track at CLEF} 2009. Three runs were submitted. The first one contained only results from the InSicht} QA} system; it showed high precision, but low recall. The combination with results from the GIR} system GIRSA} increased recall considerably, but reduced precision. The second run used a standard IR} query, while the third run combined such queries with a Boolean query with selected keywords. The evaluation showed that the third run achieved significantly higher mean average precision (MAP) than the second run. In both cases, integrating GIR} methods and QA} methods was successful in combining their strengths (high precision of deep QA, high recall of GIR), resulting in the third-best performance of automatic runs in GikiCLEF.} The overall performance still leaves room for improvements. For example, the multilingual approach is too simple. All processing is done in only one Wikipedia (the German one); results for the nine other languages are collected by following the translation links in Wikipedia.
Hattori, S. & Tanaka, K. Extracting concept hierarchy knowledge from the Web based on property inheritance and aggregation Wl 2008. 2008 IEEE/WIC/ACM International Conference on Web Intelligence. IAT 2008. 2008 IEEE/WIC/ACM International Conference on Intelligent Agent Technology. Wl-IAT Workshop 2008. 2008 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Workshops, 9-12 Dec. 2008 Piscataway, NJ, USA} 2008 [510] Concept hierarchy knowledge, such as hyponymy and meronymy, is very important for various natural language processing systems. While WordNet} and Wikipedia are being manually constructed and maintained as lexical ontologies, many researchers have tackled how to extract concept hierarchies from very large corpora of text documents such as the Web not manually but automatically. However, their methods are mostly based on lexico-syntactic patterns as not necessary but sufficient conditions of hyponymy and meronymy, so they can achieve high precision but low recall when using stricter patterns or they can achieve high recall but low precision when using looser patterns. Therefore, we need necessary conditions of hyponymy and meronymy to achieve high recall and not low precision. In this paper, not only Property} Inheritance from a target concept to its hyponyms but also {"Property} Aggregation from its hyponyms to the target concept is assumed to be necessary and sufficient conditions of hyponymy and we propose a method to extract concept hierarchy knowledge from the Web based on property inheritance and property aggregation."
Hauck, Rita Immersion in another Language and Culture through Multimedia and Web Resources World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [511]
Hecht, Brent & Gergle, Darren Measuring self-focus bias in community-maintained knowledge repositories Proceedings of the fourth international conference on Communities and technologies 2009 [512] Self-focus is a novel way of understanding a type of bias in community-maintained Web 2.0 graph structures. It goes beyond previous measures of topical coverage bias by encapsulating both node- and edge-hosted biases in a single holistic measure of an entire community-maintained graph. We outline two methods to quantify self-focus, one of which is very computationally inexpensive, and present empirical evidence for the existence of self-focus using a hyperlingual" approach that examines 15 different language editions of Wikipedia. We suggest applications of our methods and discuss the risks of ignoring self-focus bias in technological applications."
Hecht, Brent J. & Gergle, Darren On the localness" of user-generated content" Proceedings of the 2010 ACM conference on Computer supported cooperative work 2010 [513] The localness" of participation in repositories of user-generated content (UGC) with geospatial components has been cited as one of UGC's} greatest benefits. However the degree of localness in major UGC} repositories such as Flickr and Wikipedia has never been examined. We show that over 50 percent of Flickr users contribute local information on average and over 45 percent of Flickr photos are local to the photographer. Across four language editions of Wikipedia however we find that participation is less local. We introduce the spatial content production model (SCPM) as a possible factor in the localness of UGC} and discuss other theoretical and applied implications."
Heer, Rex My Space in College: Students Use of Virtual Communities to Define their Fit in Higher Education Society for Information Technology \& Teacher Education International Conference 2007 [514]
Hellmann, S.; Stadler, C.; Lehmann, J. & Auer, S. DBpedia Live Extraction On the Move to Meaningful Internet Systems: OTM 2009. Confederated International Conferences CoopIS, DOA, IS, and ODBASE 2009, 1-6 Nov. 2009 Berlin, Germany 2009 [515] The DBpedia} project extracts information from Wikipedia, interlinks it with other knowledge bases, and makes this data available as RDF.} So far the DBpedia} project has succeeded in creating one of the largest knowledge bases on the Data Web, which is used in many applications and research prototypes. However, the heavy-weight extraction process has been a drawback. It requires manual effort to produce a new release and the extracted information is not up-to-date. We extended DBpedia} with a live extraction framework, which is capable of processing tens of thousands of changes per day in order to consume the constant stream of Wikipedia updates. This allows direct modifications of the knowledge base and closer interaction of users with DBpedia.} We also show how the Wikipedia community itself is now able to take part in the DBpedia} ontology engineering process and that an interactive roundtrip engineering between Wikipedia and DBpedia} is made possible.
Hengstler, Julia Exploring Open Source for Educators: We're Not in Kansas Anymore--Entering Os World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [516]
Hennis, Thieme; Veen, Wim & Sjoer, Ellen Future of Open Courseware; A Case Study World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [517]
Heo, Gyeong Mi; Lee, Romee & Park, Young Blog as a Meaningful Learning Context: Adult Bloggers as Cyworld Users in Korea World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [518]
Herbold, Katy & Hsiao, Wei-Ying Online Learning on Steroids: Combining Brain Research with Time Saving Techniques World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [519]
Herczeg, Michael Educational Media: From Canned Brain Food to Knowledge Traces World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [520]
Herring, Donna & Friery, Kathleen efolios for 21st Century Learners World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [521]
Herring, Donna; Hibbs, Roger; Morgan, Beth & Notar, Charles Show What You Know: ePortfolios for 21st Century Learners Society for Information Technology \& Teacher Education International Conference 2007 [522]
Herrington, Anthony; Kervin, Lisa & Ilias, Joanne Blogging Beginning Teaching Society for Information Technology \& Teacher Education International Conference 2006 [523]
Herrington, Jan Authentic E-Learning in Higher Education: Design Principles for Authentic Learning Environments and Tasks World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [524]
Heuer, Lars Towards converting the internet into topic maps Proceedings of the 2nd international conference on Topic maps research and applications 2006 [525] This paper describes Semants, a work-in progress framework that uses the Wikipedia as focal point to collect information from various resources. Semants aims at developing several specialized applications (the ants) that are used to convert a resource into a topic map fragment that is merged into a bigger topic map.
Hewitt, Jim & Peters, Vanessa Using Wikis to Support Knowledge Building in a Graduate Education Course World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [526]
Hewitt, Jim; Peters, Vanessa & Brett, Clare Using Wiki Technologies as an Adjunct to Computer Conferencing in a Graduate Teacher Education Course Society for Information Technology \& Teacher Education International Conference 2006 [527]
Higdon, Jude; Miller, Sean & Paul, Nora Educational Gaming for the Rest of Us: Thinking Worlds and WYSIWYG Game Development World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [528]
Hoehndorf, R.; Prufer, K.; Backhaus, M.; Herre, H.; Kelso, J.; Loebe, F. & Visagie, J. A proposal for a gene functions wiki On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops Berlin, Germany 2006 Large knowledge bases integrating different domains can provide a foundation for new applications in biology such as data mining or automated reasoning. The traditional approach to the construction of such knowledge bases is manual and therefore extremely time consuming. The ubiquity of the Internet now makes large-scale community collaboration for the construction of knowledge bases, such as the successful online encyclopedia Wikipedia"} possible. We propose an extension of this model to the collaborative annotation of molecular data. We argue that a semantic wiki provides the functionality required for this project since this can capitalize on the existing representations in biological ontologies. We discuss the use of a different relationship model than the one provided by RDF} and OWL} to represent the semantic data. We argue that this leads to a more intuitive and correct way to enter semantic content in the wiki. Furthermore we show how formal ontologies could be used to increase the usability of the software through type-checking and automatic reasoning"
Holcomb, Lori & Beal, Candy Using Web 2.0 to Support Learning in the Social Studies Context Society for Information Technology \& Teacher Education International Conference 2008 [529]
Holifield, Phil Visual History Project: an Image Map Authoring Tool Assisting Students to Present Project Information World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [530]
Holmes, Bryn; Wasty, Shujaat; Hafeez, Khaled & Ahsan, Shakib The Knowledge Box: Can a technology bring schooling to children in crisis? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [531]
Honnibal, Matthew; Nothman, Joel & Curran, James R. Evaluating a statistical CCG parser on Wikipedia Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources 2009 [532] The vast majority of parser evaluation is conducted on the 1984 Wall Street Journal (WSJ).} In-domain evaluation of this kind is important for system development, but gives little indication about how the parser will perform on many practical problems. Wikipedia is an interesting domain for parsing that has so far been under-explored. We present statistical parsing results that for the first time provide information about what sort of performance a user parsing Wikipedia text can expect. We find that the C\&C} parser's standard model is 4.3\% less accurate on Wikipedia text, but that a simple self-training exercise reduces the gap to 3.8\%. The self-training also speeds up the parser on newswire text by 20\%.
Hopson, David & Martland, David Network Web Directories: Do they deliver and to whom? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2004 [533]
Hoven, Debra Networking to learn: blogging for social and collaborative purposes World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [534]
Hsu, Yu-Chang; Ching, Yu-Hui & Grabowski, Barbara Bookmarking/Tagging in the Web 2.0 Era: From an Individual Cognitive Tool to a Collaborative Knowledge Construction Tool for Educators World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [535]
Hu, Meiqun; Lim, Ee-Peng; Sun, Aixin; Lauw, Hady Wirawan & Vuong, Ba-Quy On improving wikipedia search using article quality Proceedings of the 9th annual ACM international workshop on Web information and data management 2007 [536] Wikipedia is presently the largest free-and-open online encyclopedia collaboratively edited and maintained by volunteers. While Wikipedia offers full-text search to its users, the accuracy of its relevance-based search can be compromised by poor quality articles edited by non-experts and inexperienced contributors. In this paper, we propose a framework that re-ranks Wikipedia search results considering article quality. We develop two quality measurement models, namely Basic and Peer Review, to derive article quality based on co-authoring data gathered from articles' edit history. Compared WithWikipedia's full-text search engine, Google and Wikiseek, our experimental results showed that (i) quality-only ranking produced by Peer Review gives comparable performance to that of Wikipedia and Wikiseek; (ii) Peer Review combined with relevance ranking outperforms Wikipedia's full-text search significantly, delivering search accuracy comparable to Google.
ling Huang, Hsiang & ju Hung, Yu An overview of information technology on language education World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [537]
Huang, Wenhao & Yoo, Sunjoo How Do Web 2.0 Technologies Motivate Learners? A Regression Analysis based on the Motivation, Volition, and Performance Theory World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [538]
Huang, Yin-Fu & Huang, Yu-Yu A framework automating domain ontology construction WEBIST 2008. Fourth International Conference on Web Information Systems and Technologies, 4-7 May 2008 Madeira, Portugal 2008 This paper proposed a general framework that could automatically construct domain ontology on a collection of documents with the help of The Free Dictionary, WordNet, and Wikipedia Categories. Both explicit and implicit features of index terms in documents are used to evaluate word correlations and then to construct Is-A} relationships in the framework. Thus, the built ontology would consist of 1) concepts, 2) Is-A} and Parts-of relationships among concepts, and 3) word relationships. Besides, the built ontology could be further refined by learning from incremental documents periodically. To help users browse the built ontology, an ontology browsing system was implemented and provided different search modes and functionality to facilitate searching a variety of relationships.
Huckell, Travis The Academic Exception as Foundation for Innovation in Online Learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [539]
Hussein, Ramlah; Saeed, Moona; Karim, Nor Shahriza Abdul & Mohamed, Norshidah Instructor’s Perspective on Factors influencing Effectiveness of Virtual Learning Environment (VLE) in the Malaysian Context: Proposed Framework World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [540]
Hwang, Jya-Lin University EFL Students’ Learning Strategies On Multimedia YouTube World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [541]
Höller, Harald & Reisinger, Peter Wiki Based Teaching and Learning Scenarios at the University of Vienna World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [542]
Høivik, Helge An Experimental Player/Editor for Web-based Multi-Linguistic Cooperative Lectures World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [543]
Høivik, Helge Read and Write Text and Context - Learning as Poietic Fields of Engagement World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [544]
Iftene, A. Identifying Geographical Entities in Users' Queries Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [545] In 2009 we built a system in order to compete in the LAGI} task (Log} Analysis and Geographic Query Identification). The system uses an external resource built into GATE} in combination with Wikipedia and Tumba in order to identify geographical entities in user's queries. The results obtained with and without Wikipedia resources are comparable. The main advantage of only using GATE} resources is the improved run time. In the process of system evaluation we have identified the main problem of our approach: the system has insufficient external resources for the recognition of geographic entities.
Iftene, Adrian Building a Textual Entailment System for the RTE3 Competition. Application to a QA System Proceedings of the 2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing 2008 [546] Access critical reviews of computing literature. Become a reviewer for Computing Reviews
Indrie, S.M. & Groza, A. Enacting argumentative web in semantic wikipedia 2010 9th Roedunet International Conference (RoEduNet), 24-26 June 2010 Piscataway, NJ, USA} 2010 This research advocates the idea of combining argumentation theory with the social web technology, aiming to enact large scale or mass argumentation. The proposed framework allows mass-collaborative editing of structured arguments in the style of semantic wikipedia. The long term goal is to apply the abstract machinery of argumentation theory to more practical applications based on human generated arguments, such as deliberative democracy, business negotiation, or self-care.
Ingram, Richard JMU/Microsoft Partnership for 21st Century Skills: Overview of Goals, Activities, and Challenges Society for Information Technology \& Teacher Education International Conference 2007 [547]
Inkpen, Kori; Gutwin, Carl & Tang, John Proceedings of the 2010 ACM conference on Computer supported cooperative work 2010 [548] Welcome to the 2010 ACM} Conference on Computer Supported Cooperative Work! We hope that this conference will be a place to hear exciting talks about the latest in CSCW} research, an opportunity to learn new things, and a chance to connect with friends in the community. We are pleased to see such a strong and diverse program at this year's conference. We have a mix of research areas represented -- some that are traditionally part of our community, and several that have not been frequently seen at CSCW.} There are sessions to suit every taste: from collaborative software development, healthcare, and groupware technologies, to studies of Wikipedia, family communications, games, and volunteering. We are particularly interested in a new kind of forum at the conference this year -- the {'CSCW} Horizon' -- which will present novel and challenging ideas, and will do so in a more interactive fashion than standard paper sessions. The program is an exciting and topical mix of cutting-edge research and thought in CSCW.} A major change for CSCW} beginning this year is our move from being a biennial to an annual conference. This has meant a change in the time of the conference (from November to February), and subsequent changes in all of our normal deadlines and procedures. Despite these changes, the community has responded with enormous enthusiasm, and we look forward to the future of yearly meetings under the ACM} CSCW} banner.
Ioannou, Andri Towards a Promising Technology for Online Collaborative Learning: Wiki Threaded Discussion World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [549]
Ioannou, Andri & Artino, Anthony Incorporating Wikis in an Educational Technology Course: Ideas, Reflections and Lessons Learned … Society for Information Technology \& Teacher Education International Conference 2008 [550]
Ion, Radu; Ştefănescu, Dan; Ceauşu, Alexandru & Tufiş, Dan RACAI's QA system at the Romanian-Romanian QA@CLEF2008 main task Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access 2008 [551] This paper describes the participation of the Research Institute for Artificial Intelligence of the Romanian Academy (RACAI) to the Multiple Language Question Answering Main Task at the CLEF} 2008 competition. We present our Question Answering system answering Romanian questions from Romanian Wikipedia documents focusing on the implementation details. The presentation will also emphasize the fact that question analysis, snippet selection and ranking provide a useful basis of any answer extraction mechanism.
Iqbal, Muhammad; Barton, Greg & Barton, Siew Mee Internet in the pesantren: A tool to promote or continue autonomous learning? Global Learn Asia Pacific 2010 [552]
Ireland, Alice; Kaufman, David & Sauvé, Louise Simulation and Advanced Gaming Environments (SAGE) for Learning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [553]
Iske, Stefan & Marotzki, Winfried Wikis: Reflexivity, Processuality and Participation World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [554]
Jackson, Allen; Gaudet, Laura; Brammer, Dawn & McDaniel, Larry Curriculum, a Change in Theoretical Thinking Theory World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [555]
Jacquin, Christine; Desmontils, Emmanuel & Monceaux, Laura French EuroWordNet Lexical Database Improvements Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing 2009 [556] Semantic knowledge is often used in the framework of Natural Language Processing (NLP) applications. However, for some languages different from English, such knowledge is not always easily available. In fact, for example, French thesaurus are not numerous and are not enough developed. In this context, we present two modifications made on the French version of the EuroWordnet} Thesaurus in order to improve it. Firstly, we present the French EuroWordNet} thesaurus and its limits. Then we explain two improvements we have made. We add non-existing relationships by using the bilinguism capability of the EuroWordnet} thesaurus, and definitions by using an external multilingual resource (Wikipedia} [1]).
Jadidinejad, A.H. & Mahmoudi, F. Cross-language Information Retrieval Using Meta-language Index Construction and Structural Queries Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [557] Structural Query Language allows expert users to richly represent its information needs but unfortunately, the complexity of SQLs} make them impractical in the Web search engines. Automatically detecting the concepts in an unstructured user's information need and generating a richly structured, multilingual equivalent query is an ideal solution. We utilize Wikipedia as a great concept repository and also some state of the art algorithms for extracting Wikipedia's concepts from the user's information need. This process is called Query} Wikification". Our experiments on the TEL} corpus at CLEF2009} achieves +23\% and + 17\% improvement in Mean Average Precision and Recall against the baseline. Our approach is unique in that it does improve both precision and recall; two pans that often improving one hurt the another."
Jamaludin, Rozinah; Annamalai, Subashini & Abdulwahed, Mahmoud Web 1.0, Web 2.0: Implications to move from Education 1.0 to Education 2.0 to enhance collaborative intelligence towards the future of Web 3.0 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [558]
von Jan, Ute; Ammann, Alexander; Matthies, Herbert K. & von Jan, Ute Generating and Presenting Dynamic Knowledge in Medicine and Dentistry World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [559]
Jang, Soobaek & Green, T.M. Best practices on delivering a wiki collaborative solution for enterprise applications 2006 International Conference on Collaborative Computing: Networking, Applications and Worksharing, 17-20 Nov. 2006 Piscataway, NJ, USA} 2006 Wikis have become a hot topic in the world of collaboration tools. Wikipedia.org, a vast, community-driven encyclopedia, has proven to be an invaluable information resource that has been developed through collaboration among thousands of people around the world. Today wikis are increasingly being employed for a wide variety of uses in business. Consequently, one of the key challenges is to enable wikis to interoperate with informational and business process applications. The ability to dynamically change the content of Webpages and reflect the changes within an enterprise application brings the power of collaboration to business applications. This paper includes general information about wikis and describes how to use a wiki solution within an enterprise application. Integrating an enterprise application with a wild permits real-time updates of pages in the application by certain groups of experts, without deploying files from the Web application server
Jankowski, Jacek & Decker, Stefan 2LIP: filling the gap between the current and the three-dimensional web Proceedings of the 14th International Conference on 3D Web Technology 2009 [560] In this article we present a novel approach, the {2-Layer} Interface Paradigm (2LIP), for designing simple yet interactive {3D} web applications, an attempt to marry advantages of {3D} experience with the advantages of the narrative structure of hypertext. The hypertext information, together with graphics, and multimedia, is presented semi-transparently on the foreground layer. It overlays the {3D} representation of the information displayed in the background of the interface. Hyperlinks are used for navigation in the {3D} scenes (in both layers). We introduce a reference implementation of {2LIP:} Copernicus - The Virtual {3D} Encyclopedia, which can become a model for building {3D} Wikipedia. Based on the evaluation of Copernicus we show that designing web interfaces according to {2LIP} provides users with a better experience during browsing the Web, has a positive effect on the visual and associative memory, improves spatial cognition of presented information, and increases overall user's satisfaction without harming the interaction.
Jansche, Martin & Sproat, Richard Named entity transcription with pair n-gram models Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration 2009 [561] We submitted results for each of the eight shared tasks. Except for Japanese name kanji restoration, which uses a noisy channel model, our Standard Run submissions were produced by generative long-range pair n-gram models, which we mostly augmented with publicly available data (either from LDC} datasets or mined from Wikipedia) for the Non-Standard} Runs.
Javanmardi, S. & Lopes, C.V. Modeling trust in collaborative information systems 2007 International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2007), 12 Nov.-15 Nov. 2007 Piscataway, NJ, USA} 2007 [562] Collaborative systems available on the Web allow millions of users to share information through a growing collection of tools and platforms such as wikis, blogs and shared forums. All of these systems contain information and resources with different degrees of sensitivity. However, the open nature of such infrastructures makes it difficult for users to determine the reliability of the available information and trustworthiness of information providers. Hence, integrating trust management systems to open collaborative systems can play a crucial role in the growth and popularity of open information repositories. In this paper, we present a trust model for collaborative systems, namely for platforms based on Wiki technology. This model, based on hidden Markov models, estimates the reputation of the contributors and the reliability of the content dynamically. The focus of this paper is on reputation estimation. Evaluation results based on a subset of Wikipedia shows that the model can effectively be used for identifying vandals, and users with high quality contributions.
Jijkoun, Valentin; Khalid, Mahboob Alam; Marx, Maarten & de Rijke, Maarten Named entity normalization in user generated content Proceedings of the second workshop on Analytics for noisy unstructured text data 2008 [563] Named entity recognition is important for semantically oriented retrieval tasks, such as question answering, entity retrieval, biomedical retrieval, trend detection, and event and entity tracking. In many of these tasks it is important to be able to accurately normalize the recognized entities, i.e., to map surface forms to unambiguous references to real world entities. Within the context of structured databases, this task (known as record linkage and data de-duplication) has been a topic of active research for more than five decades. For edited content, such as news articles, the named entity normalization (NEN) task is one that has recently attracted considerable attention. We consider the task in the challenging context of user generated content (UGC), where it forms a key ingredient of tracking and media-analysis systems. A baseline NEN} system from the literature (that normalizes surface forms to Wikipedia pages) performs considerably worse on UGC} than on edited news: accuracy drops from 80\% to 65\% for a Dutch language data set and from 94\% to 77\% for English. We identify several sources of errors: entity recognition errors, multiple ways of referring to the same entity and ambiguous references. To address these issues we propose five improvements to the baseline NEN} algorithm, to arrive at a language independent NEN} system that achieves overall accuracy scores of 90\% on the English data set and 89\% on the Dutch data set. We show that each of the improvements contributes to the overall score of our improved NEN} algorithm, and conclude with an error analysis on both Dutch and English language UGC.} The NEN} system is computationally efficient and runs with very modest computational requirements.
Jitkrittum, Wittawat; Haruechaiyasak, Choochart & Theeramunkong, Thanaruk QAST: question answering system for Thai Wikipedia Proceedings of the 2009 Workshop on Knowledge and Reasoning for Answering Questions 2009 [564] We propose an open-domain question answering system using Thai Wikipedia as the knowledge base. Two types of information are used for answering a question: (1) structured information extracted and stored in the form of Resource Description Framework (RDF), and (2) unstructured texts stored as a search index. For the structured information, SPARQL} transformed query is applied to retrieve a short answer from the RDF} base. For the unstructured information, keyword-based query is used to retrieve the shortest text span containing the questions's key terms. From the experimental results, the system which integrates both approaches could achieve an average MRR} of 0.47 based on 215 test questions.
Johnson, Peter C.; Kapadia, Apu; Tsang, Patrick P. & Smith, Sean W. Nymble: anonymous IP-address blocking Proceedings of the 7th international conference on Privacy enhancing technologies 2007 [565] Anonymizing networks such as Tor allow users to access Internet services privately using a series of routers to hide the client's IP} address from the server. Tor's success, however, has been limited by users employing this anonymity for abusive purposes, such as defacing Wikipedia. Website administrators rely on IP-address} blocking for disabling access to misbehaving users, but this is not practical if the abuser routes through Tor. As a result, administrators block all Tor exit nodes, denying anonymous access to honest and dishonest users alike. To address this problem, we present a system in which (1) honest users remain anonymous and their requests unlinkable; (2) a server can complain about a particular anonymous user and gain the ability to blacklist the user for future connections; (3) this blacklisted user's accesses before the complaint remain anonymous; and (4) users are aware of their blacklist status before accessing a service. As a result of these properties, our system is agnostic to different servers' definitions of misbehavior.
Jordan, C.; Watters, C. & Toms, E. Using Wikipedia to make academic abstracts more readable Proceedings of the American Society for Information Science and Technology 2008 [566]
Junior, João Batista Bottentuit; Coutinho, Clara & Junior, João Batista Bottentuit The use of mobile technologies in Higher Education in Portugal: an exploratory survey World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [567]
Kabisch, Thomas; Padur, Ronald & Rother, Dirk UsingWeb Knowledge to Improve the Wrapping of Web Sources Proceedings of the 22nd International Conference on Data Engineering Workshops 2006 [568] During the wrapping of web interfaces ontological know-ledge is important in order to support an automated interpretation of information. The development of ontologies is a time consuming issue and not realistic in global contexts. On the other hand, the web provides a huge amount of knowledge, which can be used instead of ontologies. Three common classes of web knowledge sources are: Web Thesauri, search engines and Web encyclopedias. The paper investigates how Web knowledge can be utilized to solve the three semantic problems Parameter Finding for Query Interfaces, Labeling of Values and Relabeling after interface evolution. For the solution of the parameter finding problem an algorithm has been implemented using the web encyclopedia WikiPedia} for the initial identification of parameter value candidates and the search engine Google for a validation of label-value relationships. The approach has been integrated into a wrapper definition framework.
Kallis, John R. & Patti, Christine Creating an Enhanced Podcast with Section 508 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [569]
Kameyama, Shumei; Uchida, Makoto & Shirayama, Susumu A New Method for Identifying Detected Communities Based on Graph Substructure Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops 2007 [570] Many methods have been developed that can detect community structures in complex networks. The detection methods can be classified into three groups based on their characteristic properties. In this study, the inherent features of the detection methods were used to develop a method that identifies communities extracted using a given community detection method. Initially, a common detection method is used to divide a network into communities. The communities are then identified using another detection method from adifferent class. In this paper, the community structures are first extracted from a network using the method proposed by Newman and Girvan. The extracted communities are then identified using the proposed detection method that is an extension of the vertex similarity method proposed by Leicht et al. The proposed method was used to identify communities in a blog network (blogosphere) and in a Wikipedia wordnetwork.
Kaminishi, Hidekazu & Murota, Masao Development of Multi-Screen Presentation Software World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [571]
Kapur, Manu; Hung, David; Jacobson, Michael; Voiklis, John; Kinzer, Charles K. & Victor, Chen Der-Thanq Emergence of learning in computer-supported, large-scale collective dynamics: a research agenda Proceedings of the 8th iternational conference on Computer supported collaborative learning 2007 [572] Seen through the lens of complexity theory, past CSCL} research may largely be characterized as small-scale (i.e., small-group) collective dynamics. While this research tradition is substantive and meaningful in its own right, we propose a line of inquiry that seeks to understand computer-supported, large-scale collective dynamics: how large groups of interacting people leverage technology to create emergent organizations (knowledge, structures, norms, values, etc.) at the collective level that are not reducible to any individual, e.g., Wikipedia, online communities etc. How does learning emerge in such large-scale collectives? Understanding the interactional dynamics of large-scale collectives is a critical and an open research question especially in an increasingly participatory, inter-connected, media-convergent culture of today. Recent CSCL} research has alluded to this; we, however, develop the case further in terms of what it means for how one conceives learning, as well as methodologies for seeking understandings of how learning emerges in these large-scale networks. In the final analysis, we leverage complexity theory to advance computational agent-based models (ABMs) as part of an integrated, iteratively-validated Phenomenological-ABM} inquiry cycle to understand emergent phenomenon from the bottom up"."
Karadag, Zekeriya & McDougall, Douglas E-contests in Mathematics: Technological Challenges versus Technological Innovations World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [573]
Karakus, Turkan; Sancar, Hatice & Cagiltay, Kursat An Eye Tracking Study: The Effects of Individual Differences on Navigation Patterns and Recall Performance on Hypertext Environments World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [574]
Karlsson, Mia Teacher Educators Moving from Learning the Office Package to Learning About Digital Natives' Use of ICT World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [575]
Karsenti, Thierry; Goyer, Sophie; Villeneuve, Stephane & Raby, Carole The efficacy of eportfolios : an experiment with pupils and student teachers from Canada Society for Information Technology \& Teacher Education International Conference 2007 [576]
Karsenti, Thierry; Villeneuve, Stephane & Goyer, Sophie The Development of an Eportfolio for Student Teachers World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [577]
Kasik, Maribeth Montgomery & Kasik, Maribeth Montgomery Been there done that: emerged, evolved and ever changing face of e-learning and emerging technologies. Society for Information Technology \& Teacher Education International Conference 2008 [578]
Kasik, Maribeth Montgomery; Mott, Michael; Wasowski, Robert & Kasik, Maribeth Montgomery Cyber Bullies Among the Digital Natives and Emerging Technologies World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [579]
Keengwe, Jared Enhacing e-learning through Technology and Constructivist Pedagogy World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2005 [580]
Kennard, Carl Differences in Male and Female Wiki Participation during Educational Group Projects World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [581]
Kennard, Carl Wiki Productivity and Discussion Forum Activity in a Postgraduate Online Distance Learning Course World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [582]
Kennedy, Ian One Encyclopedia Per Child (OEPC) in Simple English World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [583]
Ketterl, Markus & Morisse, Karsten User Generated Web Lecture Snippets to Support a Blended Learning Approach World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [584]
Khalid, Mahboob Alam & Verberne, Suzan Passage retrieval for question answering using sliding windows Proceedings of the 2nd workshop on Information Retrieval for Question Answering 2008 [585] The information retrieval (IR) community has investigated many different techniques to retrieve passages from large collections of documents for question answering (QA).} In this paper, we specifically examine and quantitatively compare the impact of passage retrieval for QA} using sliding windows and disjoint windows. We consider two different data sets, the TREC} 2002--2003 QA} data set, and 93 why-questions against INEX} Wikipedia. We discovered that, compared to disjoint windows, using sliding windows results in improved performance of TREC-QA} in terms of TDRR, and in improved performance of Why-QA} in terms of success@n and MRR.
Kidd, Jennifer; Baker, Peter; Kaufman, Jamie; Hall, Tiffany; O'Shea, Patrick & Allen, Dwight Wikitextbooks: Pedagogical Tool for Student Empowerment Society for Information Technology \& Teacher Education International Conference 2009 [586]
Kidd, Jennifer; O'Shea, Patrick; Baker, Peter; Kaufman, Jamie & Allen, Dwight Student-authored Wikibooks: Textbooks of the Future? Society for Information Technology \& Teacher Education International Conference 2008 [587]
Kidd, Jennifer; O'Shea, Patrick; Kaufman, Jamie; Baker, Peter; Hall, Tiffany & Allen, Dwight An Evaluation of Web 2.0 Pedagogy: Student-authored Wikibook vs Traditional Textbook Society for Information Technology \& Teacher Education International Conference 2009 [588]
Kim, Daesang; Rueckert, Daniel & Hwang, Yeiseon Let’s create a podcast! Society for Information Technology \& Teacher Education International Conference 2008 [589]
Kim, Youngjun & Baek, Youngkyun Educational uses of HUD in Second Life Society for Information Technology \& Teacher Education International Conference 2010 [590]
Kimmerle, Joachim; Moskaliuk, Johannes & Cress, Ulrike Learning and knowledge building with social software Proceedings of the 9th international conference on Computer supported collaborative learning - Volume 1 2009 [591] The progress of the Internet in recent years has led to the emergence of so-called social software. This technology concedes users a more active role in creating Web content. This has important effects both on individual learning and collaborative knowledge building. In this paper we will present an integrative framework model to describe and explain learning and knowledge building with social software on the basis of systems theoretical and equilibration theoretical considerations. This model assumes that knowledge progress emerges from cognitive conflicts that result from incongruities between an individual's prior knowledge and the information which is contained in a shared digital artifact. This paper will provide empirical support for the model by applying it to Wikipedia articles and by examining knowledge-building processes using network analyses. Finally, this paper will present a review of a series of experimental studies.
Kimmons, Royce Digital Play, Ludology, and the Future of Educational Games World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [592]
Kimmons, Royce What Does Open Collaboration on Wikipedia Really Look Like? World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [593]
Kinney, Lance Evidence of Engineering Education in Virtual Worlds World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [594]
Kiran, G.V.R.; Shankar, R. & Pudi, V. Frequent Itemset Based Hierarchical Document Clustering Using Wikipedia as External Knowledge Knowledge-Based and Intelligent Information and Engineering Systems. 14th International Conference, KES 2010, 8-10 Sept. 2010 Berlin, Germany 2010 High dimensionality is a major challenge in document clustering. Some of the recent algorithms address this problem by using frequent itemsets for clustering. But, most of these algorithms neglect the semantic relationship between the words. On the other hand there are algorithms that take care of the semantic relations between the words by making use of external knowledge contained in WordNet, Mesh, Wikipedia, etc but do not handle the high dimensionality. In this paper we present an efficient solution that addresses both these problems. We propose a hierarchical clustering algorithm using closed frequent itemsets that use Wikipedia as an external knowledge to enhance the document representation. We evaluate our methods based on F-Score} on standard datasets and show our results to be better than existing approaches.
Kobayashi, Michiko Creating Wikis in the technology class: How do we use Wikis in K-12 classrooms? Society for Information Technology \& Teacher Education International Conference 2010 [595]
Koh, Elizabeth & Lim, John An Integrated Collaboration System to Manage Student Team Projects Global Learn Asia Pacific 2010 [596]
Kohlhase, Andrea MS PowerPoint Use from a Micro-Perspective World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [597]
Kohlhase, Andrea What if PowerPoint became emPowerPoint (through CPoint)? Society for Information Technology \& Teacher Education International Conference 2006 [598]
Kolias, C.; Demertzis, S. & Kambourakis, G. Design and implementation of a secure mobile wiki system Seventh IASTED International Conference on Web-Based Education, 17-19 March 2008 Anaheim, CA, USA} 2008 During the last few years wikis have emerged as one of the most popular tool shells. Wikipedia has boosted their popularity, but they also keep a significant share in e- learning, intranet-based applications such as defect tracking, requirements management, test-case management, and project portals. However, existing wiki systems cannot fully support mobile clients due to several incompatibilities that exist. On the top of that, an effective secure mobile wiki system must be lightweight enough to support low-end mobile devices having several limitations. In this paper we analyze the requirements for a novel multi-platform secure wiki implementation. XML} encryption and Signature specifications are employed to realize end-to-end confidentiality and integrity services. Our scheme can be applied selectively and only to sensitive wiki content, thus diminishing by far computational resources needed at both ends; the server and the client. To address authentication of wiki clients a simple one-way authentication and session key agreement protocol is also intro-duced. The proposed solution can be easily applied to both centralized and forthcoming P2P} wiki implementations.
Kondo, Mitsumasa; Tanaka, Akimichi & Uchiyama, Tadasu Search your interests everywhere!: wikipedia-based keyphrase extraction from web browsing history Proceedings of the 21st ACM conference on Hypertext and hypermedia 2010 [599] This paper proposes a method that can extract user interests from the user's Web browsing history. Our method allows easy access to multiple content domains such as blogs, movies, QA} sites, etc. since the user does not need to input a separate search query in each domain/site. To extract user interests, the method first extracts candidate keyphrases from the user's web browsed documents. Second, important keyphrases obtained from a link structure analysis of Wikipedia content is extracted from the main contents of web documents. This technique is based on the idea that important keyphrases in Wikipedia are important keyphrases in the real world. Finally, keyphrases contained in the documents important to the user are set in order as user interests. An experiment shows that our method offers improvements over a conventional method and can recommend interests attractive to the user.
Koolen, Marijn; Kazai, Gabriella & Craswell, Nick Wikipedia pages as entry points for book search Proceedings of the Second ACM International Conference on Web Search and Data Mining 2009 [600] A lot of the world's knowledge is stored in books, which, as a result of recent mass-digitisation efforts, are increasingly available online. Search engines, such as Google Books, provide mechanisms for searchers to enter this vast knowledge space using queries as entry points. In this paper, we view Wikipedia as a summary of this world knowledge and aim to use this resource to guide users to relevant books. Thus, we investigate possible ways of using Wikipedia as an intermediary between the user's query and a collection of books being searched. We experiment with traditional query expansion techniques, exploiting Wikipedia articles as rich sources of information that can augment the user's query. We then propose a novel approach based on link distance in an extended Wikipedia graph: we associate books with Wikipedia pages that cite these books and use the link distance between these nodes and the pages that match the user query as an estimation of a book's relevance to the query. Our results show that a) classical query expansion using terms extracted from query pages leads to increased precision, and b) link distance between query and book pages in Wikipedia provides a good indicator of relevance that can boost the retrieval score of relevant books in the result ranking of a book search engine.
Kowase, Yasufumi; Kaneko, Keiichi & Ishikawa, Masatoshi A Learning System for Related Words based on Thesaurus and Image Retrievals World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [601]
Krauskopf, Karsten Developing a psychological framework for teachers’ constructive implementation of digital media in the classroom – media competence from the perspective of socio-cognitive functions of digital tools. Society for Information Technology \& Teacher Education International Conference 2009 [602]
Krishnan, S. & Bieszczad, A. SEW: the semantic Extensions to Wikipedia 2007 International Conference on Semantic Web \& Web Services (SWWS'07), 25-28 June 2007 Las Vegas, NV, USA} 2007 The Semantic Web represents the next step in the evolution of the Web. The goal of the Semantic Web initiative is to create a universal medium for data exchange where data can be shared and processed by people as well as by automated tools. The paper presents the research and implementation of an application, SEW} (Semantic} Extensions to Wikipedia), that uses the Semantic Web technologies to extract information from the user and to store the data along with the semantics. SEW} addresses the shortcomings of the existing portal, Wikipedia through its knowledge extraction and representation techniques. The paper focuses on applying SEW} to solving a problem in the real world domain.
Krotzsch, M.; Vrandecic, D. & Volkel, M. Semantic MediaWiki The Semantic Web - ISWC 2006. OTM 2006 Workshops. 5th International Semantic Web Conference, ISWC 2006. Proceedings, 5-9 Nov. 2006 Berlin, Germany 2006 Semantic MediaWiki} is an extension of MediaWiki} - a widely used wiki-engine that also powers Wikipedia. Its aim is to make semantic technologies available to a broad community by smoothly integrating them with the established usage of MediaWiki.} The software is already used on a number of productive installations world-wide, but the main target remains to establish semantic Wikipedia" as an early adopter of semantic technologies on the Web. Thus usability and scalability are as important as powerful semantic features"
Krupa, Y.; Vercouter, L.; Hubner, J.F. & Herzig, A. Trust based Evaluation of Wikipedia's Contributors Engineering Societies in the Agents World X. 10th International Workshop, ESAW 2009, 18-20 Nov. 2009 Berlin, Germany 2009 [603] Wikipedia is an encyclopedia on which anybody can change its content. Some users, self-proclaimed patrollers" regularly check recent changes in order to delete or correct those which are ruining articles integrity. The huge quantity of updates leads some articles to remain polluted a certain time before being corrected. In this work we show how a multiagent trust model can help patrollers in their task of controlling the Wikipedia. To direct the patrollers verification towards suspicious contributors our work relies on a formalisation of Castelfranchi Falcone's social trust theory to assist them by representing their trust model in a cognitive way."
Kulathuramaiyer, Narayanan & Maurer, Hermann Current Development of Mashups in Shaping Web Applications World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [604]
Kulathuramaiyer, Narayanan; Zaka, Bilal & Helic, Denis Integrating Copy-Paste Checking into an E-Learning Ecosystem World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [605]
Kumar, Swapna Building a Learning Community using Wikis in Educational Technology Courses Society for Information Technology \& Teacher Education International Conference 2009 [606]
Kumar, Swapna Can We Model Wiki Use in Technology Courses to Help Teachers Use Wikis in their Classrooms? Society for Information Technology \& Teacher Education International Conference 2008 [607]
Kumaran, A.; Khapra, Mitesh M. & Li, Haizhou Report of NEWS 2010 transliteration mining shared task Proceedings of the 2010 Named Entities Workshop 2010 [608] This report documents the details of the Transliteration Mining Shared Task that was run as a part of the Named Entities Workshop (NEWS} 2010), an ACL} 2010 workshop. The shared task featured mining of name transliterations from the paired Wikipedia titles in 5 different language pairs, specifically, between English and one of Arabic, Chinese, Hindi Russian and Tamil. Totally 5 groups took part in this shared task, participating in multiple mining tasks in different languages pairs. The methodology and the data sets used in this shared task are published in the Shared Task White Paper {[Kumaran} et al, 2010]. We measure and report 3 metrics on the submitted results to calibrate the performance of individual systems on a commonly available Wikipedia dataset. We believe that the significant contribution of this shared task is in (i) assembling a diverse set of participants working in the area of transliteration mining, (ii) creating a baseline performance of transliteration mining systems in a set of diverse languages using commonly available Wikipedia data, and (iii) providing a basis for meaningful comparison and analysis of trade-offs between various algorithmic approaches used in mining. We believe that this shared task would complement the NEWS} 2010 transliteration generation shared task, in enabling development of practical systems with a small amount of seed data in a given pair of languages.
Kumaran, A.; Khapra, Mitesh M. & Li, Haizhou Whitepaper of NEWS 2010 shared task on transliteration mining Proceedings of the 2010 Named Entities Workshop 2010 [609] Transliteration is generally defined as phonetic translation of names across languages. Machine Transliteration is a critical technology in many domains, such as machine translation, cross-language information retrieval/extraction, etc. Recent research has shown that high quality machine transliteration systems may be developed in a language-neutral manner, using a reasonably sized good quality corpus ({\textasciitilde}15--25K} parallel names) between a given pair of languages. In this shared task, we focus on acquisition of such good quality names corpora in many languages, thus complementing the machine transliteration shared task that is concurrently conducted in the same NEWS} 2010 workshop. Specifically, this task focuses on mining the Wikipedia paired entities data (aka, inter-wiki-links) to produce high-quality transliteration data that may be used for transliteration tasks.
Kunnath, Maria Lorna MLAKedusoln eLearnovate's Unified E-Learning Strategy For the Semantic Web World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [610]
Kupatadze, Ketevan Conducting chemistry lessons in Georgian schools with computer-educational programs (exemplificative one concrete programe) World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [611]
Kurhila, Jaakko Unauthorized" Use of Social Software to Support Formal Higher Education" World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [612]
Kutty, S.; Tran, Tien; Nayak, R. & Li, Yuefeng Clustering XML documents using frequent subtrees Advances in Focused Retrieval. 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, 15-18 Dec. 2008 Berlin, Germany 2009 This paper presents an experimental study conducted over the INEX} 2008 Document Mining Challenge corpus using both the structure and the content of XML} documents for clustering them. The concise common substructures known as the closed frequent subtrees are generated using the structural information of the XML} documents. The closed frequent subtrees are then used to extract the constrained content from the documents. A matrix containing the term distribution of the documents in the dataset is developed using the extracted constrained content. The k-way clustering algorithm is applied to the matrix to obtain the required clusters. In spite of the large number of documents in the INEX} 2008 Wikipedia dataset, the proposed frequent subtree-based clustering approach was successful in clustering the documents. This approach significantly reduces the dimensionality of the terms used for clustering without much loss in accuracy.
Lahti, Lauri Guided Generation of Pedagogical Concept Maps from the Wikipedia World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [613]
Lahti, L. Educational tool based on topology and evolution of hyperlinks in the Wikipedia 2010 IEEE 10th International Conference on Advanced Learning Technologies (ICALT 2010), 5-7 July 2010 Los Alamitos, CA, USA} 2010 [614] We propose a new method to support educational exploration in the hyperlink network of the Wikipedia online encyclopedia. The learner is provided with alternative parallel ranking lists, each one promoting hyperlinks that represent a different pedagogical perspective to the desired learning topic. The learner can browse the conceptual relations between the latest versions of articles or the conceptual relations belonging to consecutive temporal versions of an article, or a mixture of both approaches. Based on her needs and intuition, the learner explores hyperlink network and meanwhile the method builds automatically concept maps that reflect her conceptualization process and can be used for varied educational purposes. Initial experiments with a prototype tool based on the method indicate enhancement to ordinary learning results and suggest further research.
Lai, Alice An Examination of Technology-Mediated Feminist Consciousness-raising in Art Education Society for Information Technology \& Teacher Education International Conference 2010 [615]
Lapadat, Judith; Atkinson, Maureen & Brown, Willow The Electronic Lives of Teens: Negotiating Access, Producing Digital Narratives, and Recovering From Internet Addiction World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [616]
Lara, Sonia & Naval, Concepción Educative proposal of web 2.0 for the encouragement of social and citizenship competence World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [617]
Larson, M.; Newman, E. & Jones, G.J.F. Overview of VideoCLEF 2009: new perspectives on speech-based multimedia content enrichment Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [618] VideoCLEF} 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment. For each task, video data (Dutch-language} television, predominantly documentaries) accompanied by speech recognition transcripts were provided. The Subject Classification Task involved automatic tagging of videos with subject theme labels. The best performance was achieved by approaching subject tagging as an information retrieval task and using both speech recognition transcripts and archival metadata. Alternatively, classifiers were trained using either the training data provided or data collected from Wikipedia or via general Web search. The Affect Task involved detecting narrative peaks, defined as points where viewers perceive heightened dramatic tension. The task was carried out on the Beeldenstorm"} collection containing 45 short-form documentaries on the visual arts. The best runs exploited affective vocabulary and audience directed speech. Other approaches included using topic changes elevated speaking pitch increased speaking intensity and radical visual changes. The Linking Task also called {'Finding} Related Resources Across Languages involved linking video to material on the same subject in a different language. Participants were provided with a list of multimedia anchors (short video segments) in the Dutch-language Beeldenstorm"} collection and were expected to return target pages drawn from English-language Wikipedia. The best performing methods used the transcript of the speech spoken during the multimedia anchor to build a query to search an index of the Dutch-language Wikipedia. The Dutch Wikipedia pages returned were used to identify related English pages. Participants also experimented with pseudo-relevance feedback query translation and methods that targeted proper names."
Lau, C.; Tjondronegoro, D.; Zhang, J.; Geva, S. & Liu, Y. Fusing visual and textual retrieval techniques to effectively search large collections of Wikipedia images Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany 2007 This paper presents an experimental study that examines the performance of various combination techniques for content-based image retrieval using a fusion of visual and textual search results. The evaluation is comprehensively benchmarked using more than 160,000 samples from INEX-MM2006} images dataset and the corresponding XML} documents. For visual search, we have successfully combined Hough transform, object's color histogram, and texture (H.O.T).} For comparison purposes, we used the provided UvA} features. Based on the evaluation, our submissions show that Uva+Text} combination performs most effectively, but it is closely followed by our {H.O.T-} (visual only) feature. Moreover, {H.O.T+Text} performance is still better than UvA} (visual) only. These findings show that the combination of effective text and visual search results can improve the overall performance of CBIR} in Wikipedia collections which contain a heterogeneous (i.e. wide) range of genres and topics.
Leake, David & Powell, Jay Mining Large-Scale Knowledge Sources for Case Adaptation Knowledge Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development 2007 [619] Making case adaptation practical is a longstanding challenge for case-based reasoning. One of the impediments to widespread use of automated case adaptation is the adaptation knowledge bottleneck: the adaptation process may require extensive domain knowledge, which may be difficult or expensive for system developers to provide. This paper advances a new approach to addressing this problem, proposing that systems mine their adaptation knowledge as needed from pre-existing large-scale knowledge sources available on the World Wide Web. The paper begins by discussing the case adaptation problem, opportunities for adaptation knowledge mining, and issues for applying the approach. It then presents an initial illustration of the method in a case study of the testbed system WebAdapt.} WebAdapt} applies the approach in the travel planning domain, using OpenCyc, Wikipedia, and the Geonames GIS} database as knowledge sources for generating substitutions. Experimental results suggest the promise of the approach, especially when information from multiple sources is combined.
Lee, Jennifer Fads and Facts in Technology-Based Learning Environments Society for Information Technology \& Teacher Education International Conference 2009 [620]
Lee, Stella & Dron, Jon Giving Learners Control through Interaction Design World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [621]
Lee, Zeng-Han Attitude Changes Toward Applying Technology (A case study of Meiho Institute of Technology in Taiwan) Society for Information Technology \& Teacher Education International Conference 2008 [622]
Lemay, Philippe Game and flow concepts for learning: some considerations Society for Information Technology \& Teacher Education International Conference 2008 [623]
Leong, Peter; Joseph, Samuel; Ho, Curtis & Fulford, Catherine Learning to learn in a virtual world: An exploratory qualitative study Global Learn Asia Pacific 2010 [624]
Li, Haizhou & Kumaran, A. Proceedings of the 2010 Named Entities Workshop 2010 [625] Named Entities play a significant role in Natural Language Processing and Information Retrieval. While identifying and analyzing named entities in a given natural language is a challenging research problem by itself, the phenomenal growth in the Internet user population, especially among the Non-English} speaking parts of the world, has extended this problem to the crosslingual arena. We specifically focus on research on all aspects of the Named Entities in our workshop series, Named Entities WorkShop} (NEWS).} The first of the NEWS} workshops (NEWS} 2009) was held as a part of ACL-IJCNLP} 2009 conference in Singapore, and the current edition (NEWS} 2010) is being held as a part of ACL} 2010, in Uppsala, Sweden. The purpose of the NEWS} workshop is to bring together researchers across the world interested in identification, analysis, extraction, mining and transformation of named entities in monolingual or multilingual natural language text. The workshop scope includes many interesting specific research areas pertaining to the named entities, such as, orthographic and phonetic characteristics, corpus analysis, unsupervised and supervised named entities extraction in monolingual or multilingual corpus, transliteration modelling, and evaluation methodologies, to name a few. For this years edition, 11 research papers were submitted, each of which was reviewed by at least 3 reviewers from the program committee. 7 papers were chosen for publication, covering main research areas, from named entities recognition, extraction and categorization, to distributional characteristics of named entities, and finally a novel evaluation metrics for co-reference resolution. All accepted research papers are published in the workshop proceedings. This year, as parts of the NEWS} workshop, we organized two shared tasks: one on Machine Transliteration Generation, and another on Machine Transliteration Mining, participated by research teams from around the world, including industry, government laboratories and academia. The transliteration generation task was introduced in NEWS} 2009. While the focus of the 2009 shared task was on establishing the quality metrics and on baselining the transliteration quality based on those metrics, the 2010 shared task expanded the scope of the transliteration generation task to about dozen languages, and explored the quality depending on the direction of transliteration, between the languages. We collected significantly large, hand-crafted parallel named entities corpora in dozen different languages from 8 language families, and made available as common dataset for the shared task. We published the details of the shared task and the training and development data six months ahead of the conference that attracted an overwhelming response from the research community. Totally 7 teams participated in the transliteration generation task. The approaches ranged from traditional unsupervised learning methods (such as, Phrasal SMT-based, Conditional Random Fields, etc.) to somewhat unique approaches (such as, DirectTL} approach), combined with several model combinations for results re-ranking. A report of the shared task that summarizes all submissions and the original whitepaper are also included in the proceedings, and will be presented in the workshop. The participants in the shared task were asked to submit short system papers (4 pages each) describing their approach, and each of such papers was reviewed by at least two members of the program committee to help improve the quality of the content and presentation of the papers. 6 of them were finally accepted to be published in the workshop proceedings (one participating team did not submit their system paper in time). NEWS} 2010 also featured a second shared task this year, on Transliteration Mining; in this shared task we focus specifically on mining transliterations from the commonly available resource Wikipedia titles. The objective of this shared task is to identify transliterations from linked Wikipedia titles between English and another language in a Non-

Latin} script. 5 teams participated in the mining task, each participating in multiple languages. The shared task was conducted in 5 language pairs, and the paired Wikipedia titles between English and each of the languages was provided to the participants. The participating systems output was measured using three specific metrics. All the results are reported in the shared task report. We hope that NEWS} 2010 would provide an exciting and productive forum for researchers working in this research area. The technical programme includes 7 research papers and 9 system papers (3 as oral papers, and 6 as poster papers) to be presented in the workshop. Further, we are pleased to have Dr Dan Roth, Professor at University of Illinois and The Beckman Institute, delivering the keynote speech at the workshop.

Li, Yun; Tian, Fang; Ren, F.; Kuroiwa, S. & Zhong, Yixin A method of semantic dictionary construction from online encyclopedia classifications 2007 IEEE International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE '07), 30 Aug.-1 Sept. 2007 Piscataway, NJ, USA} 2007 This paper introduces a method of constructing a semantic dictionary automatically from the keywords and classify relations of the web encyclopedia Chinese WikiPedia.} Semantic units, which are affixes (core/modifier) shared between many phrased-keywords, are selected using statistic method and string affix matching, also with other units to explain the semantic meanings. Then the result are used to mark the semantic explanations for most WikiPedia} keywords by analyzing surface text or upper classes. The feature form ' structure or advantages comparing to other semantic resource are also concerned.
Liao, Ching-Jung & Sun, Cheng-Chieh A RIA-Based Collaborative Learning System for E-Learning 2.0 World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [626]
Liao, Ching-Jung & Yang, Jin-Tan The Development of a Pervasive Collaborative LMS 2.0 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [627]
Lim, Keol & Park, So Youn An Exploratory Approach to Understanding the Purposes of Computer and Internet Use in Web 2.0 Trends World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [628]
Lim, Ee-Peng; Vuong, Ba-Quy; Lauw, Hady Wirawan & Sun, Aixin Measuring Qualities of Articles Contributed by Online Communities Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence 2006 [629] Using open source Web editing software (e.g., wiki), online community users can now easily edit, review and publish articles collaboratively. While much useful knowledge can be derived from these articles, content users and critics are often concerned about their qualities. In this paper, we develop two models, namely basic model and peer review model, for measuring the qualities of these articles and the authorities of their contributors. We represent collaboratively edited articles and their contributors in a bipartite graph. While the basic model measures an article's quality using both the authorities of contributors and the amount of contribution from each contributor, the peer review model extends the former by considering the review aspect of article content. We present results of experiments conducted on some Wikipedia pages and their contributors. Our result show that the two models can effectively determine the articles' qualities and contributors' authorities using the collaborative nature of online communities.
Lin, Hong & Kelsey, Kathleen Do Traditional and Online Learning Environments Impact Collaborative Learning with Wiki? World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [630]
Lin, S. College students' perceptions, motivations and uses of Wikipedia Proceedings of the American Society for Information Science and Technology 2008 [631]
Lin, Chun-Yi Integrating wikis to support collaborative learning in higher education: A design-based approach to developing the instructional theory World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [632]
Lin, Chun-Yi & Lee, Hyunkyung Adult Learners' Motivations in the Use of Wikis: Wikipedia, Higher Education, and Corporate Settings World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [633]
Lin, Chun-Yi; Lee, Lena & Bonk, Curtis Teaching Innovations on Wikis: Practices and Perspectives of Early Childhood and Elementary School Teachers World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [634]
Lin, Meng-Fen Grace; Sajjapanroj, Suthiporn & Bonk, Curtis Wikibooks and Wikibookians: Loosely-Coupled Community or the Future of the Textbook Industry? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [635]
Lindroth, Tomas & Lundin, Johan Students with laptops – the laptop as portfolio Society for Information Technology \& Teacher Education International Conference 2010 [636]
Linser, Roni; Ip, Albert; Rosser, Elizabeth & Leigh, Elyssebeth On-line Games, Simulations \& Role-plays as Learning Environments: Boundary and Role Characteristics World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [637]
Lisk, Randy & Brown, Victoria Digital Paper: The Possibilities Society for Information Technology \& Teacher Education International Conference 2009 [638]
Liu, Leping & Maddux, Cleborne Online Publishing: A New Online Journal on “Social Media in Education‿ Society for Information Technology \& Teacher Education International Conference 2009 [639]
Liu, Min; Hamilton, Kurstin & Wivagg, Jennifer Facilitating Pre-Service Teachers’ Understanding of Technology Use With Instructional Activities Society for Information Technology \& Teacher Education International Conference 2010 [640]
Liu, Sandra Shu-Chao & Lin, Elaine Mei-Ying Using the Internet in Developing Taiwanese Students' English Writing Abilities World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [641]
Liu, Xiongyi; Li, Lan & Vonderwell, Selma Digital Ink-Based Engaged Learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [642]
Liu, X.; Qin, J.; Chen, M. & Park, J.-H. Automatic semantic mapping between query terms and controlled vocabulary through using WordNet and Wikipedia Proceedings of the American Society for Information Science and Technology 2008 [643]
Livingston, Michael; Strickland, Jane & Moulton, Shane Decolonizing Indigenous Web Sites World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [644]
Livne, Nava; Livne, Oren & Wight, Charles Automated Error Analysis through Parsing Mathematical Expressions in Adaptive Online Learning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [645]
Llorente, A.; Motta, E. & Ruger, S. Exploring the Semantics behind a Collection to Improve Automated Image Annotation Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [646] The goal of this research is to explore several semantic relatedness measures that help to refine annotations generated by a baseline non-parametric density estimation algorithm. Thus, we analyse the benefits of performing a statistical correlation using the training set or using the World Wide Web versus approaches based on a thesaurus like WordNet} or Wikipedia (considered as a hyperlink structure). Experiments are carried out using the dataset provided by the 2009 edition of the ImageCLEF} competition, a subset of the MIR-Flickr} 25k collection. Best results correspond to approaches based on statistical correlation as they do not depend on a prior disambiguation phase like WordNet} and Wikipedia. Further work needs to be done to assess whether proper disambiguation schemas might improve their performance.
Lopes, António; Pires, Bruno; Cardoso, Márcio; Santos, Arnaldo; Peixinho, Filipe; Sequeira, Pedro & Morgado, Leonel System for Defining and Reproducing Handball Strategies in Second Life On-Demand for Handball Coaches’ Education World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [647]
Lopes, Rui & Carriço, Luis On the credibility of wikipedia: an accessibility perspective Proceeding of the 2nd ACM workshop on Information credibility on the web 2008 [648] User interfaces play a critical role on the credibility of authoritative information sources on the Web. Citation and referencing mechanisms often provide the required support for the independent verifiability of facts and, consequently, influence the credibility of the conveyed information. Since the quality level of these references has to be verifiable by users without any barriers, user interfaces cannot pose problems on accessing information. This paper presents a study about the influence of accessibility of user interfaces on the credibility of Wikipedia articles. We have analysed the accessibility quality level of the articles and the external Web pages used as authoritative references. This study has shown that there is a discrepancy on the accessibility of referenced Web pages, which can compromise the overall credibility of Wikipedia. Based on these results, we have analysed the article referencing lifecycle (technologies and policies) and propose a set of improvements that can help increasing the accessibility of references within Wikipedia articles.
Lopes, Rui & Carriço, Luís The impact of accessibility assessment in macro scale universal usability studies of the web Proceedings of the 2008 international cross-disciplinary conference on Web accessibility (W4A) 2008 [649] This paper presents a modelling framework, Web Interaction Environments, to express the synergies and differences of audiences, in order to study universal usability of the Web. Based on this framework, we have expressed the implicit model of WCAG} and developed an experimental study to assess the Web accessibility quality of Wikipedia at a macro scale. This has resulted on finding out that template mechanisms such as those provided by Wikipedia lower the burden of producing accessible contents, but provide no guarantee that hyperlinking to external websites maintain accessibility quality. We discuss the black-boxed nature of guidelines such as WCAG} and how formalising audiences helps leveraging universal usability studies of the Web at macro scales.
Lopez, Patrice & Romary, Laurent HUMB: Automatic key term extraction from scientific articles in GROBID Proceedings of the 5th International Workshop on Semantic Evaluation 2010 [650] The Semeval task 5 was an opportunity for experimenting with the key term extraction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents. The tool first uses GROBID's} facilities for analyzing the structure of scientific articles, resulting in a first set of structural features. A second set of features captures content properties based on phraseness, informativeness and keywordness measures. Two knowledge bases, GRISP} and Wikipedia, are then exploited for producing a last set of lexical/semantic features. Bagged decision trees appeared to be the most efficient machine learning algorithm for generating a list of ranked key term candidates. Finally a post ranking was realized based on statistics of cousage of keywords in {HAL, a large Open Access publication repository.
Lops, P.; Basile, P.; de Gemmis, M. & Semeraro, G. Language Is the Skin of My Thought: Integrating Wikipedia and AI to Support a Guillotine Player AI*IA 2009: Emergent Perspectives in Artificial Intelligence. Xlth International Conference of the Italian Association for Artificial Intelligence, 9-12 Dec. 2009 Berlin, Germany 2009 [651] This paper describes OTTHO} (On} the Tip of my THOught), a system designed for solving a language game, called Guillotine, which demands knowledge covering a broad range of topics, such as movies, politics, literature, history, proverbs, and popular culture. The rule of the game is simple: the player observes five words, generally unrelated to each other, and in one minute she has to provide a sixth word, semantically connected to the others. The system exploits several knowledge sources, such as a dictionary, a set of proverbs, and Wikipedia to realize a knowledge infusion process. The paper describes the process of modeling these sources and the reasoning mechanism to find the solution of the game. The main motivation for designing an artificial player for Guillotine is the challenge of providing the machine with the cultural and linguistic background knowledge which makes it similar to a human being, with the ability of interpreting natural language documents and reasoning on their content. Experiments carried out showed promising results. Our feeling is that the presented approach has a great potential for other more practical applications besides solving a language game.
Lotzmann, U. Enhancing agents with normative capabilities 24th European Conference on Modelling and Simulation, ECMS 2010, 1-4 June 2010 Nottingham, UK} 2010 This paper describes the derivation of a software architecture (and its implementation called EMIL-S) from a logical normative agent architecture (called EMIL-A).} After a short introduction into the theoretical background of agent-based normative social simulation, the paper focuses on intra-agent structures and processes. The pivotal element in this regard is a rule-based agent design with a corresponding generalised intra-agent process" that involves decision making and learning capabilities. The resulting simulation dynamics are illustrated afterwards by means of an application sample where agents contribute to a Wikipedia community by writing editing and discussing articles. Findings and material presented in the paper are part of the results achieved in the FP6} project EMIL} (EMergence} In the Loop: Simulating the two-way dynamics of norm innovation)."
Louis, Ellyn St; McCauley, Pete; Breuch, Tyler; Hatten, Jim & Louis, Ellyn St Artscura: Experiencing Art Through Art World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [652]
Lowerison, Gretchen & Schmid, Richard F Pedagogical Implications of Using Learner-Controlled, Web-based Tools for Learning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [653]
Lu, Jianguo; Wang, Yan; Liang, Jie; Chen, Jessica & Liu, Jiming An Approach to Deep Web Crawling by Sampling Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01 2008 [654] Crawling deep web is the process of collecting data from search interfaces by issuing queries. With wide availability of programmable interface encoded in web services, deep web crawling has received a large variety of applications. One of the major challenges crawling deep web is the selection of the queries so that most of the data can be retrieved at a low cost. We propose a general method in this regard. In order to minimize the duplicates retrieved, we reduced the problem of selecting an optimal set of queries from a sample of the data source into the well-known set-covering problem and adopt a classical algorithm to resolve it. To verify that the queries selected from a sample also produce a good result for the entire data source, we carried out a set of experiments on large corpora including Wikipedia and Reuters. We show that our sampling-based method is effective by empirically proving that 1) The queries selected from samples can harvest most of the data in the original database; 2) The queries with low overlapping rate in samples will also result in a low overlapping rate in the original database; and 3) The size of the sample and the size of the terms from where to select the queries do not need to be very large.
Lu, Laura Digital Divide: Does the Internet Speak Your Language? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [655]
Lucassen, Teun & Schraagen, Jan Maarten Trust in wikipedia: how users trust information from an unknown source Proceedings of the 4th workshop on Information credibility 2010 [656] The use of Wikipedia as an information source is becoming increasingly popular. Several studies have shown that its information quality is high. Normally, when considering information trust, the source of information is an important factor. However, because of the open-source nature of Wikipedia articles, their sources remain mostly unknown. This means that other features need to be used to assess the trustworthiness of the articles. We describe article features - such as images and references - which lay Wikipedia readers use to estimate trustworthiness. The quality and the topics of the articles are manipulated in an experiment to reproduce the varying quality on Wikipedia and the familiarity of the readers with the topics. We show that the three most important features are textual features, references and images.
Lund, Andreas & Rasmussen, Ingvill Tasks 2.0: Education Meets Social Computing and Mass Collaboration Society for Information Technology \& Teacher Education International Conference 2010 [657]
Luther, Kurt Supporting and transforming leadership in online creative collaboration Proceedings of the ACM 2009 international conference on Supporting group work 2009 [658] Behind every successful online creative collaboration, from Wikipedia to Linux, is at least one effective project leader. Yet, we know little about what such leaders do and how technology supports or inhibits their work. My thesis investigates leadership in online creative collaboration, focusing on the novel context of animated movie-making. I first conducted an empirical study of existing leadership practices in this context. I am now designing a Web-based collaborative system, Sandbox, to understand the impact of technological support for centralized versus decentralized leadership in this context. My expected contributions include a comparative investigation of the effects of different types of leadership on online creative collaboration, and a set of empirically validated design principles for supporting leadership in online creative collaboration.
Luyt, B.; Kwek, Wee Tin; Sim, Ju Wei & York, Peng Evaluating the comprehensiveness of wikipedia: the case of biochemistry Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. 10th International Conference on Asian Digital Libraries, ICADL 2007, 10-13 Dec. 2007 Berlin, Germany 2007 In recent years, the world of encyclopedia publishing has been challenged as new collaborative models of online information gathering and sharing have developed. Most notable of these is Wikipedia. Although Wikipedia has a core group of devotees, it has also attracted critical comment and concern, most notably in regard to its quality. In this article we compare the scope of Wikipedia and Encyclopedia Britannica in the subject of biochemistry using a popular first year undergraduate textbook as a benchmark for concepts that should appear in both works, if they are to be considered comprehensive in scope.
Lykourentzou, Ioanna; Vergados, Dimitrios J. & Loumos, Vassili Collective intelligence system engineering Proceedings of the International Conference on Management of Emergent Digital EcoSystems 2009 [659] Collective intelligence (CI) is an emerging research field which aims at combining human and machine intelligence, to improve community processes usually performed by large groups. CI} systems may be collaborative, like Wikipedia, or competitive, like a number of recently established problem-solving companies that attempt to find solutions to difficult R\&D} or marketing problems drawing on the competition among web users. The benefits that CI} systems earn user communities, combined with the fact that they share a number of basic common characteristics, open up the prospect for the design of a general methodology that will allow the efficient development and evaluation of CI.} In the present work, an attempt is made to establish the analytical foundations and main challenges for the design and construction of a generic collective intelligence system. First, collective intelligence systems are categorized into active and passive and specific examples of each category are provided. Then, the basic modeling framework of CI} systems is described. This includes concepts such as the set of possible user actions, the CI} system state and the individual and community objectives. Additional functions, which estimate the expected user actions, the future state of the system, as well as the level of objective fulfillment, are also established. In addition, certain key issues that need to be considered prior to system launch are also described. The proposed framework is expected to promote efficient CI} design, so that the benefit gained by the community and the individuals through the use of CI} systems, will be maximized.
Mach, Nada Gaming, Learning 2.0, and the Digital Divide World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [660]
Mach, Nada Reorganizing Schools to Engage Learners through Using Learning 2.0 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [661]
Mach, Nada & Bhattacharya, Madhumita Social Learning Versus Individualized Learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [662]
MacKenzie, Kathleen Distance Education Policy: A Study of the SREB Faculty Support Policy Construct at Four Virtual College and University Consortia. World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [663]
Maddux, Cleborne; Johnson, Lamont & Ewing-Taylor, Jacque An Annotated Bibliography of Outstanding Educational Technology Sites on the Web: A Study of Usefulness and Design Quality Society for Information Technology \& Teacher Education International Conference 2006 [664]
Mader, Elke; Budka, Philipp; Anderl, Elisabeth; Stockinger, Johann & Halbmayer, Ernst Blended Learning Strategies for Methodology Education in an Austrian Social Science Setting World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [665]
Malik, Manish Work In Progress: Use of Social Software for Final Year Project Supervision at a Campus Based University World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [666]
Malyn-Smith, Joyce; Coulter, Bob; Denner, Jill; Lee, Irene; Stiles, Joel & Werner, Linda Computational Thinking in K-12: Defining the Space Society for Information Technology \& Teacher Education International Conference 2010 [667]
Manfra, Meghan; Friedman, Adam; Hammond, Thomas & Lee, John Peering behind the curtain: Digital history, historiography, and secondary social studies methods Society for Information Technology \& Teacher Education International Conference 2009 [668]
Marenzi, Ivana; Demidova, Elena & Nejdl, Wolfgang LearnWeb 2.0 - Integrating Social Software for Lifelong Learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [669]
Margaryan, Anoush; Nicol, David; Littlejohn, Allison & Trinder, Kathryn Students’ use of technologies to support formal and informal learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [670]
Martin, Philippe; Eboueya, Michel; Blumenstein, Michael & Deer, Peter A Network of Semantically Structured Wikipedia to Bind Information World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [671]
Martin, Sylvia S. & Crawford, Caroline M. Special Education Methods Coursework: Information Literacy for Teachers through the Implementation of Graphic Novels Society for Information Technology \& Teacher Education International Conference 2007 [672]
Martinez-Cruz, C. & Angeletou, S. Folksonomy expansion process using soft techniques 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2010), 10-12 Aug. 2010 Piscataway, NJ, USA} 2010 [673] The use of folksonomies involves several problems due to its lack of semantics associated with them. The nature of these structures makes difficult the process to enrich them semantically by the association of meaningful terms of the Semantic Web. This task implies a phase of disambiguation and another of expansion of the initial tagset, returning an increased contextualised set where synonyms, hyperonyms, gloss terms, etc. are part of it. In this novel proposal a technique based on confidence and similarity degrees is applied to weight this extended tagset in order to allow the user to obtain a customised resulting tagset. Moreover a comparision between the two main thesaurus, WordNet} and Wikipedia, are presented due to their great influence in the disambiguation and expansion process.
Martland, David The Development of Web/Learning Communities: Is Technology the Way Forward? Society for Information Technology \& Teacher Education International Conference 2004 [674]
Martland, David E-learning: What communication tools does it require? World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2003 [675]
Mass, Y. IBM HRL at INEX 06 Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany 2007 In previous INEX} years we presented an XML} component ranking algorithm that was based on separation of nested XML} elements to different indices. This worked fine for the IEEE} collection which has a small number of potential component types that can be returned as query results. However, such an assumption doesn't scale to this year Wikipedia collection where there is a large set of potential component types that can be returned. We show a new version of the component ranking algorithm that does not assume any knowledge on the set of component types. We then show some preliminary work we did to exploit the connectivity of the Wikipedia collection to improve ranking.
Matsuno, Ryoji; Tsutsumi, Yutaka; Matsuo, Kanako & Gilbert, Richard MiWIT: Integrated ESL/EFL Text Analysis Tools for Content Creation in MSWord World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [676]
Matthew, Kathryn & Callaway, Rebecca Wiki as a Collaborative Learning Tool World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [677]
Matthew, Kathryn; Callaway, Rebecca; Matthew, Christie & Matthew, Josh Online Solitude: A Lack of Student Interaction World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [678]
Matthew, Kathryn; Felvegi, Emese & Callaway, Rebecca Collaborative Learning Using a Wiki Society for Information Technology \& Teacher Education International Conference 2009 [679]
Maurer, Hermann & Kulathuramaiyer, Narayanan Coping With the Copy-Paste-Syndrome World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [680]
Maurer, Hermann & Safran, Christian Beyond Wikipedia World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [681]
Maurer, Hermann & Schinagl, Wolfgang E-Quiz - A Simple Tool to Enhance Intra-Organisational Knowledge Management eLearning and Edutainment Training World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [682]
Maurer, Hermann & Schinagl, Wolfgang Wikis and other E-communities are Changing the Web World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [683]
Maurer, Hermann & Zaka, Bilal Plagiarism - A Problem And How To Fight It World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [684]
McCulloch, Allison & Smith, Ryan The Nature of Students’ Collaboration in the Creation of a Wiki Society for Information Technology \& Teacher Education International Conference 2009 [685]
McCulloch, Allison; Smith, Ryan; Wilson, P. Holt; McCammon, Lodge; Stein, Catherine & Arias, Cecilia Creating Asynchronous Learning Communities in Mathematics Teacher Education, Part 2 Society for Information Technology \& Teacher Education International Conference 2009 [686]
McDonald, Roger Using the Secure Wiki for Teaching Scientific Collaborative World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [687]
McGee, Patricia; Carmean, Colleen; Rauch, Ulrich; Noakes, Nick & Lomas, Cyprien Learning in a Virtual World, Part 2 World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [688]
McKay, Sean Wiki as CMS Society for Information Technology \& Teacher Education International Conference 2005 [689]
McKay, Sean & Headley, Scot Best Practices for the Use of Wikis in Teacher Education Programs Society for Information Technology \& Teacher Education International Conference 2007 [690]
McLoughlin, Catherine & Lee, Mark J.W. Listen and learn: A systematic review of the evidence that podcasting supports learning in higher education World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [691]
McNeil, Sara; White, Cameron; Angela, Miller & Behling, Debbie Emerging Web 2.0 Technologies to Enhance Teaching and Learning in American History Classrooms Society for Information Technology \& Teacher Education International Conference 2009 [692]
Mehdad, Yashar; Moschitti, Alessandro & Zanzotto, Fabio Massimo Syntactic/semantic structures for textual entailment recognition HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics 2010 [693] In this paper, we describe an approach based on off-the-shelf parsers and semantic resources for the Recognizing Textual Entailment (RTE) challenge that can be generally applied to any domain. Syntax is exploited by means of tree kernels whereas lexical semantics is derived from heterogeneous resources, e.g. WordNet} or distributional semantics through Wikipedia. The joint syntactic/semantic model is realized by means of tree kernels, which can exploit lexical related-ness to match syntactically similar structures, i.e. whose lexical compounds are related. The comparative experiments across different RTE} challenges and traditional systems show that our approach consistently and meaningfully achieves high accuracy, without requiring any adaptation or tuning.
Meijer, Erik