Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
Fork of
WikiProject to page category Xref, 50% scan, limit 2M rows, prefilter
by
The Anome
This query is marked as a draft
This query has been published
by
The Anome
.
Part of a plan to assign articles to WikiProjects using Naive Bayes. Still being worked on. Limited to just a few pages, for testing purposes. Sorting by all three result columns to make output more compressible later. Now uses page_random to filter articles, hopefully a better filter if indexed: Now doing 50% scan, limited to first 2M rows... Executed in 5455.36 seconds as of Wed, 11 Oct 2023 08:53:12 UTC. Resultset (2000000 rows) Now doing 100% scan! Executed in 10568.74 seconds as of Wed, 11 Oct 2023 13:56:47 UTC. Resultset (2000000 rows) Just under 3 hours. Not bad at all.
Toggle Highlighting
SQL
SELECT pagecats.cl_to AS pagecat_name, talkcats.cl_to AS talkcat_name, -- REGEXP_EXTRACT(talkcats.cl_to, '(?WikiProject_|.*_importance_|.*_priority_|Unassessed)(.*)'), -- does not work COUNT(*) AS my_count FROM page AS article INNER JOIN page AS talk ON talk.page_title = article.page_title INNER JOIN categorylinks AS pagecats ON pagecats.cl_from = article.page_id INNER JOIN categorylinks AS talkcats ON talkcats.cl_from = talk.page_id WHERE article.page_random <= 0.01 -- page_random is between 0 and 1, so this sets the probe size AND article.page_namespace = 0 AND talk.page_namespace = 1 AND article.page_is_redirect = 0 AND talk.page_is_redirect = 0 -- filter out soft redirect and disambig pages entirely AND NOT EXISTS (SELECT 1 from page WHERE pagecats.cl_to LIKE "%disambig%" OR pagecats.cl_to LIKE "%Disambig%" OR pagecats.cl_to LIKE "%redirect%" OR pagecats.cl_to LIKE "%Redirect%" ) AND ( talkcats.cl_to LIKE "%WikiProject_%" OR talkcats.cl_to LIKE "%-Class_%_articles" OR talkcats.cl_to LIKE "%-importance_%_articles" OR talkcats.cl_to LIKE "%-priority_%_articles" OR talkcats.cl_to LIKE "Unassessed_%_articles" ) AND NOT ( talkcats.cl_to LIKE "%vital%" OR talkcats.cl_to LIKE "%Version%" ) AND NOT (pagecats.cl_to LIKE "%Disambig%" OR pagecats.cl_to LIKE "%disambig%" OR pagecats.cl_to LIKE "%set_index%" OR pagecats.cl_to LIKE "Set_index%") AND NOT (talkcats.cl_to LIKE "%Disambig%" OR talkcats.cl_to LIKE "%disambig%") AND NOT (pagecats.cl_to LIKE "Short_description%" OR pagecats.cl_to LIKE "%_errors%" OR pagecats.cl_to LIKE "CS1_%" OR pagecats.cl_to LIKE "%short_description%" OR pagecats.cl_to LIKE "%articles%" OR pagecats.cl_to LIKE "Articles%" OR pagecats.cl_to LIKE "%pages%" OR pagecats.cl_to LIKE "%disputes%" OR pagecats.cl_to LIKE "Pages_%" OR pagecats.cl_to LIKE "Use_dmy_date%" OR pagecats.cl_to LIKE "Use_mdy_date%" OR pagecats.cl_to LIKE "Use_%" OR pagecats.cl_to LIKE "EngvarB_%" OR pagecats.cl_to LIKE "%Wikipedia%" OR pagecats.cl_to LIKE "%articles%" OR pagecats.cl_to LIKE "%Articles%" OR pagecats.cl_to LIKE "%Wikidata%" OR pagecats.cl_to = "Source_attribution" OR pagecats.cl_to LIKE "Webarchive%") GROUP BY pagecats.cl_to, talkcats.cl_to ORDER BY my_count DESC, pagecats.cl_to, talkcats.cl_to LIMIT 2000000
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...