Wikimedia dumps
From WikiPapers
(Redirected from Wikipedia dump)
| Wikimedia dumps (Alternative names for this dataset) | |
| Keyword(s) | mediawiki, xml dumps |
| Size | From a few MB to several GB |
| Language(s) | Multilingual |
| Author(s) | wikimedians |
| License(s) | Unknown [+] |
| Website | http://dumps.wikimedia.org/ |
| Related material | |
| Related dataset(s) | Wikimedia Foundation image dump, WikiTeam dumps, Wikia dumps, Domas visits logs, Picture of the Year archives |
| Related tool(s) | Unknown [+] |
| Search | |
| Google Scholar | |
| Export and share | |
| BibTeX, CSV, RDF, JSON | |
| | |
| Browse properties ยท List of datasets | |
Wikimedia dumps are complete copies of all Wikimedia wikis, in the form of wikitext source and metadata embedded in XML. A number of raw database tables in SQL form are also available.
- Main site for XML & SQL dumps: http://dumps.wikimedia.org/
- Archive with some historical dumps: Wikimedia Downloads historical Archives
- Timeline of dumps procedures: http://wikitech.wikimedia.org/view/Dumps/History
- Dumps at Internet Archive: http://www.archive.org/details/wikimediadownloads
[edit] Example
You can see an example of the XML scheme for MediaWiki pages using the Special:Export tool (direct link).
Publications
| Title | Author(s) | Keyword(s) | Published in | Language | DateThis property is a special property in this wiki. | Abstract | R | C |
|---|---|---|---|---|---|---|---|---|
| Extraction of RDF Dataset from Wikipedia Infobox Data | Jimmy K. Chiu Thomas Y. Lee Sau Dan Lee Hailey H. Zhu David W. Cheung |
English | 2010 | This paper outlines the cleansing and extraction process of infobox data from Wikipedia data dump into Resource Description Framework (RDF) triplets. The numbers of the extracted triplets, resources, and predicates are substantially large enough for many research purposes such as semantic web search. Our software tool will be open-sourced for researchers to produce up-to-date RDF datasets from routine Wikipedia data dumps. | 0 | 0 | ||
| Wikipedia: A Quantitative Analysis | Felipe Ortega | Universidad Rey Juan Carlos, Spain | English | 2009 | 0 | 6 | ||
| Measuring Wikipedia | Jakob Voss | Wikipedia Wiki Statistics Informetrics Webometrics Cybermetrics |
International Conference of the International Society for Scientometrics and Informetrics | English | 2005 | Wikipedia, an international project that uses Wiki software to collaboratively create an encyclopaedia, is becoming more and more popular. Everyone can directly edit articles and every edit is recorded. The version history of all articles is freely available and allows a multitude of examinations. This paper gives an overview on Wikipedia research. Wikipedia's fundamental components, i.e. articles, authors, edits, and links, as well as content and quality are analysed. Possibilities of research are explored including examples and first results. Several characteristics that are found in Wikipedia, such as exponential growth and scale-free networks are already known in other context. However the Wiki architecture also possesses some intrinsic specialties. General trends are measured that are typical for all Wikipedias but vary between languages in detail. | 12 | 11 |
