Wikipedia search data

From WikiPapers
Jump to: navigation, search

Wikipedia search data are logs about search queries by visitors.

Logs take down[edit]

Message from Wikimedia blog:

(Update 9/20 17:40 PDT) It appeared that a small percentage of queries contained information unintentionally inserted by users. For example, some users may have pasted unintended information from their clipboards into the search box, causing the information to be displayed in the datasets. This prompted us to withdraw the files.
We are looking into the feasibility of publishing search logs at an aggregated level, but, until further notice, we do not plan on publishing this data in the near future.
Diederik van Liere, Product Manager Analytics

Log structure[edit]

Each line in the log files is tab separated and it contains the following fields:

  1. Server hostname
  2. Timestamp (UTC)
  3. Wikimedia project
  4. URL encoded search query
  5. Total number of results
  6. Lucene score of best match
  7. Interwiki result
  8. Namespace (coded as integer)
  9. Namespace (human-readable)
  10. Title of best matching article

External links[edit]