Wikipedia search data
| Wikipedia search data|
(Alternative names for this dataset)
|Keyword(s)||wikipedia, wikimedia, search, logs|
|Export and share|
|BibTeX, CSV, RDF, JSON|
|Browse properties · List of datasets|
Wikipedia search data are logs about search queries by visitors.
Logs take down
Message from Wikimedia blog:
- (Update 9/20 17:40 PDT) It appeared that a small percentage of queries contained information unintentionally inserted by users. For example, some users may have pasted unintended information from their clipboards into the search box, causing the information to be displayed in the datasets. This prompted us to withdraw the files.
- We are looking into the feasibility of publishing search logs at an aggregated level, but, until further notice, we do not plan on publishing this data in the near future.
- Diederik van Liere, Product Manager Analytics
Each line in the log files is tab separated and it contains the following fields:
- Server hostname
- Timestamp (UTC)
- Wikimedia project
- URL encoded search query
- Total number of results
- Lucene score of best match
- Interwiki result
- Namespace (coded as integer)
- Namespace (human-readable)
- Title of best matching article