Data mining
From WikiPapers
(Redirected from Mining Wikipedia)
| data mining (Alternative names for this keyword) | |
| Related keyword(s) | opinion mining |
| Export and share | |
| BibTeX, CSV, RDF, JSON | |
| | |
| Browse properties · List of keywords | |
data mining is included as keyword or extra keyword in 0 datasets, 2 tools and 18 publications.
Datasets
There is no datasets for this keyword.
Tools
| Tool | Operating System(s) | Language(s) | Programming language(s) | License | Description | Image |
|---|---|---|---|---|---|---|
| Wikipedia Miner | ||||||
| Wikokit | Cross-platform | Multilingual | Java | EPLv1.0 LGPLv2.1 GPLv2 ALv2.0 New BSD License |
wikokit (wiki tool kit) - several projects related to wiki. wiwordik - machine-readable Wiktionary. A visual interface to the parsed English Wiktionary and Russian Wiktionary databases. |
Publications
| Title | Author(s) | Published in | Language | DateThis property is a special property in this wiki. | Abstract | R | C |
|---|---|---|---|---|---|---|---|
| Reverts Revisited: Accurate Revert Detection in Wikipedia | Fabian Flöck Denny Vrandečić Elena Simperl |
Hypertext and Social Media 2012 | English | June 2012 | Wikipedia is commonly used as a proving ground for research in collaborative systems. This is likely due to its popularity and scale, but also to the fact that large amounts of data about its formation and evolution are freely available to inform and validate theories and models of online collaboration. As part of the development of such approaches, revert detection is often performed as an important pre-processing step in tasks as diverse as the extraction of implicit networks of editors, the analysis of edit or editor features and the removal of noise when analyzing the emergence of the con-tent of an article. The current state of the art in revert detection is based on a rather naïve approach, which identifies revision duplicates based on MD5 hash values. This is an efficient, but not very precise technique that forms the basis for the majority of research based on revert relations in Wikipedia. In this paper we prove that this method has a number of important drawbacks - it only detects a limited number of reverts, while simultaneously misclassifying too many edits as reverts, and not distinguishing between complete and partial reverts. This is very likely to hamper the accurate interpretation of the findings of revert-related research. We introduce an improved algorithm for the detection of reverts based on word tokens added or deleted to adresses these drawbacks. We report on the results of a user study and other tests demonstrating the considerable gains in accuracy and coverage by our method, and argue for a positive trade-off, in certain research scenarios, between these improvements and our algorithm’s increased runtime. | 13 | 0 |
| Calculating Wikipedia Article Similarity Using Machine Translation Evaluation Metrics | Maike Erdmann Andrew Finch Kotaro Nakayama Eiichiro Sumita Takahiro Hara Shojiro Nishio |
WAINA | English | 2011 | 0 | 0 | |
| Mining Fuzzy Domain Ontology Based on Concept Vector from Wikipedia Category Network | Cheng-Yu Lu Shou-Wei Ho Jen-Ming Chung Fu-Yuan Hsu Hahn-Ming Lee Jan-Ming Ho |
WI-IAT | English | 2011 | 0 | 0 | |
| Voting Behavior Analysis in the Election of Wikipedia Admins | Gerard Cabunducan Ralph Castillo John Boaz Lee |
ASONAM | English | 2011 | 0 | 0 | |
| Wikipedia Sets: Context-Oriented Related Entity Acquisition from Multiple Words | Masumi Shirakawa Kotaro Nakayama Takahiro Hara Shojiro Nishio |
WI-IAT | English | 2011 | 0 | 0 | |
| Analysis of implicit relations on wikipedia: measuring strength through mining elucidatory objects | Xinpeng Zhang Yasuhito Asano Masatoshi Yoshikawa |
DASFAA | English | 2010 | 0 | 0 | |
| Computational Methods for Historical Research on Wikipedia's Archives | Jonathan Cohen | E-Research: A Journal of Undergraduate Work | English | 2010 | This paper presents a novel study of geographic information implicit in the English Wikipedia archive. This project demonstrates a method to extract data from the archive with data mining, map the global distribution of Wikipedia editors through geocoding in GIS, and proceed with a spatial analysis of Wikipedia use in metropolitan cities. | 0 | 0 |
| Enishi: searching knowledge about relations by complementarily utilizing wikipedia and the web | Xinpeng Zhang Yasuhito Asano Masatoshi Yoshikawa |
WISE | English | 2010 | 0 | 0 | |
| Mining and explaining relationships in wikipedia | Xinpeng Zhang Yasuhito Asano Masatoshi Yoshikawa |
DEXA | English | 2010 | 0 | 0 | |
| Completing wikipedia's hyperlink structure through dimensionality reduction | Robert West Doina Precup Joelle Pineau |
CIKM | English | 2009 | 0 | 0 | |
| Improving the extraction of bilingual terminology from Wikipedia | Maike Erdmann Kotaro Nakayama Takahiro Hara Shojiro Nishio |
ACM Trans. Multimedia Comput. Commun. Appl. | English | 2009 | 0 | 0 | |
| Measuring Wikipedia: a hands-on tutorial | Luca de Alfaro Felipe Ortega |
WikiSym | English | 2009 | 0 | 0 | |
| Mining meaning from Wikipedia | Olena Medelyan David N. Milne Catherine Legg Ian H. Witten |
Int. J. Hum.-Comput. Stud. | English | 2009 | 0 | 4 | |
| Quality Evaluation of Search Results by Typicality and Speciality of Terms Extracted from Wikipedia | Makoto Nakatani Adam Jatowt Hiroaki Ohshima Katsumi Tanaka |
DASFAA | English | 2009 | 0 | 0 | |
| Mining Wikipedia Resources for Discovering Answers to List Questions in Web Snippets | Alejandro Figueroa | SKG | English | 2008 | 0 | 0 | |
| Mining Wikipedia for Discovering Multilingual Definitions on the Web | Alejandro Figueroa | SKG | English | 2008 | 0 | 0 | |
| Wikipedia Mining for Huge Scale Japanese Association Thesaurus Construction | Kotaro Nakayama Masahiro Ito Takahiro Hara Shojiro Nishio |
AINAW | English | 2008 | 0 | 0 | |
| Wikipedia Mining: Wikipedia as a Corpus por Knowledge Extraction | Kotaro Nakayama Minghua Pei Maike Erdmann Masahiro Ito Masumi Shirakawa Takahiro Hara Shojiro Nishio |
Wikimania | English | 2008 | Wikipedia, a collaborative Wiki-based encyclopedia, has become a huge phenomenon among Internet users. It covers a huge number of concepts of various fields such as Arts, Geography, History, Science, Sports and Games. As a corpus for knowledge extraction, Wikipedia's impressive characteristics are not limited to the scale, but also include the dense link structure, word sense disambiguation based on URL and brief anchor texts. Because of these characteristics, Wikipedia has become a promising corpus and a big frontier for researchers. A considerable number of researches on Wikipedia Mining such as semantic relatedness measurement, bilingual dictionary construction, and ontology construction have been conducted. In this paper, we take a comprehensive, panoramic view of Wikipedia as a Web corpus since almost all previous researches are just exploiting parts of the Wikipedia characteristics. The contribution of this paper is triple-sum. First, we unveil the characteristics of Wikipedia as a corpus for knowledge extraction in detail. In particular, we describe the importance of anchor texts with special emphasis since it is helpful information for both disambiguation and synonym extraction. Second, we introduce some of our Wikipedia mining researches as well as researches conducted by other researches in order to prove the worth of Wikipedia. Finally, we discuss possible directions of Wikipedia research. | 0 | 0 |
