Data mining

From WikiPapers
(Redirected from Mining Wikipedia)
Jump to: navigation, search

data mining is included as keyword or extra keyword in 0 datasets, 2 tools and 18 publications.

Datasets

There is no datasets for this keyword.

Tools

Tool Operating System(s) Language(s) Programming language(s) License Description Image
Wikipedia Miner
Wikokit Cross-platform Multilingual Java EPLv1.0
LGPLv2.1
GPLv2
ALv2.0
New BSD License
wikokit (wiki tool kit) - several projects related to wiki.

wiwordik - machine-readable Wiktionary. A visual interface to the parsed English Wiktionary and Russian Wiktionary databases.
Java WebStart application + JavaFX, English interface.
742 languages extracted from the English Wiktionary.

423 languages extracted from the Russian Wiktionary.
Wiwordik-en.0.09.1094 scrollbox.jpg


Publications

Title Author(s) Published in Language DateThis property is a special property in this wiki. Abstract R C
Reverts Revisited: Accurate Revert Detection in Wikipedia Fabian Flöck
Denny Vrandečić
Elena Simperl
Hypertext and Social Media 2012 English June 2012 Wikipedia is commonly used as a proving ground for research in collaborative systems. This is likely due to its popularity and scale, but also to the fact that large amounts of data about its formation and evolution are freely available to inform and validate theories and models of online collaboration. As part of the development of such approaches, revert detection is often performed as an important pre-processing step in tasks as diverse as the extraction of implicit networks of editors, the analysis of edit or editor features and the removal of noise when analyzing the emergence of the con-tent of an article. The current state of the art in revert detection is based on a rather naïve approach, which identifies revision duplicates based on MD5 hash values. This is an efficient, but not very precise technique that forms the basis for the majority of research based on revert relations in Wikipedia. In this paper we prove that this method has a number of important drawbacks - it only detects a limited number of reverts, while simultaneously misclassifying too many edits as reverts, and not distinguishing between complete and partial reverts. This is very likely to hamper the accurate interpretation of the findings of revert-related research. We introduce an improved algorithm for the detection of reverts based on word tokens added or deleted to adresses these drawbacks. We report on the results of a user study and other tests demonstrating the considerable gains in accuracy and coverage by our method, and argue for a positive trade-off, in certain research scenarios, between these improvements and our algorithm’s increased runtime. 13 0
Calculating Wikipedia Article Similarity Using Machine Translation Evaluation Metrics Maike Erdmann
Andrew Finch
Kotaro Nakayama
Eiichiro Sumita
Takahiro Hara
Shojiro Nishio
WAINA English 2011 0 0
Mining Fuzzy Domain Ontology Based on Concept Vector from Wikipedia Category Network Cheng-Yu Lu
Shou-Wei Ho
Jen-Ming Chung
Fu-Yuan Hsu
Hahn-Ming Lee
Jan-Ming Ho
WI-IAT English 2011 0 0
Voting Behavior Analysis in the Election of Wikipedia Admins Gerard Cabunducan
Ralph Castillo
John Boaz Lee
ASONAM English 2011 0 0
Wikipedia Sets: Context-Oriented Related Entity Acquisition from Multiple Words Masumi Shirakawa
Kotaro Nakayama
Takahiro Hara
Shojiro Nishio
WI-IAT English 2011 0 0
Analysis of implicit relations on wikipedia: measuring strength through mining elucidatory objects Xinpeng Zhang
Yasuhito Asano
Masatoshi Yoshikawa
DASFAA English 2010 0 0
Computational Methods for Historical Research on Wikipedia's Archives Jonathan Cohen E-Research: A Journal of Undergraduate Work English 2010 This paper presents a novel study of geographic information implicit in the English Wikipedia archive. This project demonstrates a method to extract data from the archive with data mining, map the global distribution of Wikipedia editors through geocoding in GIS, and proceed with a spatial analysis of Wikipedia use in metropolitan cities. 0 0
Enishi: searching knowledge about relations by complementarily utilizing wikipedia and the web Xinpeng Zhang
Yasuhito Asano
Masatoshi Yoshikawa
WISE English 2010 0 0
Mining and explaining relationships in wikipedia Xinpeng Zhang
Yasuhito Asano
Masatoshi Yoshikawa
DEXA English 2010 0 0
Completing wikipedia's hyperlink structure through dimensionality reduction Robert West
Doina Precup
Joelle Pineau
CIKM English 2009 0 0
Improving the extraction of bilingual terminology from Wikipedia Maike Erdmann
Kotaro Nakayama
Takahiro Hara
Shojiro Nishio
ACM Trans. Multimedia Comput. Commun. Appl. English 2009 0 0
Measuring Wikipedia: a hands-on tutorial Luca de Alfaro
Felipe Ortega
WikiSym English 2009 0 0
Mining meaning from Wikipedia Olena Medelyan
David N. Milne
Catherine Legg
Ian H. Witten
Int. J. Hum.-Comput. Stud. English 2009 0 4
Quality Evaluation of Search Results by Typicality and Speciality of Terms Extracted from Wikipedia Makoto Nakatani
Adam Jatowt
Hiroaki Ohshima
Katsumi Tanaka
DASFAA English 2009 0 0
Mining Wikipedia Resources for Discovering Answers to List Questions in Web Snippets Alejandro Figueroa SKG English 2008 0 0
Mining Wikipedia for Discovering Multilingual Definitions on the Web Alejandro Figueroa SKG English 2008 0 0
Wikipedia Mining for Huge Scale Japanese Association Thesaurus Construction Kotaro Nakayama
Masahiro Ito
Takahiro Hara
Shojiro Nishio
AINAW English 2008 0 0
Wikipedia Mining: Wikipedia as a Corpus por Knowledge Extraction Kotaro Nakayama
Minghua Pei
Maike Erdmann
Masahiro Ito
Masumi Shirakawa
Takahiro Hara
Shojiro Nishio
Wikimania English 2008 Wikipedia, a collaborative Wiki-based encyclopedia, has become a huge phenomenon among Internet users. It covers a huge number of concepts of various fields such as Arts, Geography, History, Science, Sports and Games. As a corpus for knowledge extraction, Wikipedia's impressive characteristics are not limited to the scale, but also include the dense link structure, word sense disambiguation based on URL and brief anchor texts. Because of these characteristics, Wikipedia has become a promising corpus and a big frontier for researchers. A considerable number of researches on Wikipedia Mining such as semantic relatedness measurement, bilingual dictionary construction, and ontology construction have been conducted. In this paper, we take a comprehensive, panoramic view of Wikipedia as a Web corpus since almost all previous researches are just exploiting parts of the Wikipedia characteristics. The contribution of this paper is triple-sum. First, we unveil the characteristics of Wikipedia as a corpus for knowledge extraction in detail. In particular, we describe the importance of anchor texts with special emphasis since it is helpful information for both disambiguation and synonym extraction. Second, we introduce some of our Wikipedia mining researches as well as researches conducted by other researches in order to prove the worth of Wikipedia. Finally, we discuss possible directions of Wikipedia research. 0 0
Personal tools
Namespaces
Variants
Views
Actions
Navigation
Create new...
Activity
Data export
Toolbox