Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
This query is marked as a draft
This query has been published
by
Jdcc-berkman
.
Toggle Highlighting
SQL
# http://jan.kneschke.de/projects/mysql/order-by-rand/ # https://stackoverflow.com/questions/249301/simple-random-samples-from-a-sql-database # https://mariadb.com/kb/en/library/data-sampling-techniques-for-efficiently-finding-a-random-row/#case-consecutive-auto_increment-without-gaps-10-rows ### All non-deleted revisions in the article namespace # We lose 30-40% of rows as either duplicates or to the WHERE clause, so we need to bump up the limit by about 60% # Downloaded on run 384406 /* SELECT @min := MIN(rev_id), @max := MAX(rev_id) FROM revision; SELECT DISTINCT revision.rev_id, revision.rev_comment_id, revision.rev_timestamp, revision.rev_minor_edit, revision.rev_parent_id, revision.rev_len, page.* FROM revision JOIN page on revision.rev_page = page.page_id JOIN (SELECT FLOOR(@min + (@max - @min + 1) * RAND(1)) AS rev_id FROM revision LIMIT 16000) AS random USING (rev_id) WHERE rev_deleted = 0 AND page_namespace = 0 LIMIT 10000; */ ### All non-deleted revisions in the article-talk namespace # 29, 33, 30, 37, 26, 31 of 500 = 6.2%, so maybe 200,000 for 10,000 # Downloaded on run 384578 /* SELECT @min := MIN(rev_id), @max := MAX(rev_id) FROM revision; SELECT DISTINCT revision.rev_id, revision.rev_comment_id, revision.rev_timestamp, revision.rev_minor_edit, revision.rev_parent_id, revision.rev_len, page.* FROM revision JOIN page on revision.rev_page = page.page_id JOIN (SELECT FLOOR(@min + (@max - @min + 1) * RAND(5)) AS rev_id FROM revision LIMIT 200000) AS random USING (rev_id) WHERE rev_deleted = 0 AND page_namespace = 1 LIMIT 10000; */ ### All non-deleted revisions in the user namespace # 20, 22, 22, 15, 12 = 3.64%, so maybe 300,000 for 10,000 # Downloaded run 384592 /* SELECT @min := MIN(rev_id), @max := MAX(rev_id) FROM revision; SELECT DISTINCT revision.rev_id, revision.rev_comment_id, revision.rev_timestamp, revision.rev_minor_edit, revision.rev_parent_id, revision.rev_len, page.* FROM revision JOIN page on revision.rev_page = page.page_id JOIN (SELECT FLOOR(@min + (@max - @min + 1) * RAND(1)) AS rev_id FROM revision LIMIT 300000) AS random USING (rev_id) WHERE rev_deleted = 0 AND page_namespace = 2 LIMIT 10000; */ ### All non-deleted revisions in the user-talk namespace # 40, 38, 35, 35, 40 = 7.52% so 200,000 for 10,000 should be OK # Downloaded run 384595 /* SELECT @min := MIN(rev_id), @max := MAX(rev_id) FROM revision; SELECT DISTINCT revision.rev_id, revision.rev_comment_id, revision.rev_timestamp, revision.rev_minor_edit, revision.rev_parent_id, revision.rev_len, page.* FROM revision JOIN page on revision.rev_page = page.page_id JOIN (SELECT FLOOR(@min + (@max - @min + 1) * RAND(1)) AS rev_id FROM revision LIMIT 200000) AS random USING (rev_id) WHERE rev_deleted = 0 AND page_namespace = 3 LIMIT 10000; */
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...