Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
Fork of
Big Data – Week 1 – Formative w. time
by
Hendrik.schriefer
This query is marked as a draft
This query has been published
by
Hendrik.schriefer
.
•Pick a Wikipedia Language Edition • Collect the number of distinct editors who have edited each article, U. the number of revisions of each article, E. the length of the article in bytes, L. the length of the talk page associated to the article, T. • Produce the Histogram and Probability Density Function for all the 4 variables in linear scale and logarithmic scale and discuss them. • Try to remove “redirect” pages from your data. How would that change the result?
Toggle Highlighting
SQL
#USE simplewiki_p; #SELECT COUNT(rev_id) AS editcounts, rev_user, user_name, user_registration, MAX(rev_timestamp) AS maxtime, MIN(rev_timestamp) AS mintime #FROM revision JOIN user ON user_id = rev_user #JOIN page ON page_id = rev_page #WHERE user_name NOT LIKE "%bot%" AND user_name NOT LIKE "%Bot%" AND page_namespace = "0" #GROUP by rev_user #LIMIT 10000; USE simplewiki_p; SELECT page_id, page_title, page_namespace, page_len AS L, COUNT(DISTINCT rev_user) AS U, COUNT (rev_id) AS E FROM revision JOIN page ON rev_page = page_id WHERE LOWER(rev_user) NOT LIKE LOWER("%bot%") AND page_namespace="0" AND page_namespace = "1" GROUP by page_id ORDER BY RAND() LIMIT 1000; #SHOW TABLES
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...