Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
Fork of
WikiProject to page category Xref, 50% scan, limit 2M rows, prefilter
by
The Anome
This query is marked as a draft
This query has been published
by
The Anome
.
Part of a plan to assign articles to WikiProjects using Naive Bayes. This just counts the page categories.
Toggle Highlighting
SQL
SELECT pagecats.cl_to, -- talkcats.cl_to, -- REGEXP_EXTRACT(talkcats.cl_to, '(?WikiProject_|.*_importance_|.*_priority_|Unassessed)(.*)'), -- does not work COUNT(*) AS my_count FROM page AS article INNER JOIN page AS talk ON talk.page_title = article.page_title INNER JOIN categorylinks AS pagecats ON pagecats.cl_from = article.page_id INNER JOIN categorylinks AS talkcats ON talkcats.cl_from = talk.page_id WHERE article.page_random < 0.5 -- page_random is between 0 and 1, so this sets the probe size AND article.page_namespace = 0 AND talk.page_namespace = 1 AND article.page_is_redirect = 0 AND talk.page_is_redirect = 0 -- filter out soft redirect and disambig pages entirely AND NOT EXISTS (SELECT 1 from page WHERE pagecats.cl_to LIKE "%disambig%" OR pagecats.cl_to LIKE "%Disambig%" OR pagecats.cl_to LIKE "%redirect%" OR pagecats.cl_to LIKE "%Redirect%" ) AND ( talkcats.cl_to LIKE "%WikiProject_%" OR talkcats.cl_to LIKE "%-Class_%_articles" OR talkcats.cl_to LIKE "%-importance_%_articles" OR talkcats.cl_to LIKE "%-priority_%_articles" OR talkcats.cl_to LIKE "Unassessed_%_articles" ) AND NOT ( talkcats.cl_to LIKE "%vital%" OR talkcats.cl_to LIKE "%Version%" ) AND NOT (pagecats.cl_to LIKE "%Disambig%" OR pagecats.cl_to LIKE "%disambig%" OR pagecats.cl_to LIKE "%set_index%" OR pagecats.cl_to LIKE "Set_index%") AND NOT (talkcats.cl_to LIKE "%Disambig%" OR talkcats.cl_to LIKE "%disambig%") AND NOT (pagecats.cl_to LIKE "Short_description%" OR pagecats.cl_to LIKE "%_errors%" OR pagecats.cl_to LIKE "CS1_%" OR pagecats.cl_to LIKE "%short_description%" OR pagecats.cl_to LIKE "%articles%" OR pagecats.cl_to LIKE "Articles%" OR pagecats.cl_to LIKE "%pages%" OR pagecats.cl_to LIKE "%disputes%" OR pagecats.cl_to LIKE "Pages_%" OR pagecats.cl_to LIKE "Use_dmy_date%" OR pagecats.cl_to LIKE "Use_mdy_date%" OR pagecats.cl_to LIKE "Use_%" OR pagecats.cl_to LIKE "EngvarB_%" OR pagecats.cl_to LIKE "%Wikipedia%" OR pagecats.cl_to LIKE "%articles%" OR pagecats.cl_to LIKE "%Articles%" OR pagecats.cl_to LIKE "%Wikidata%" OR pagecats.cl_to = "Source_attribution" OR pagecats.cl_to LIKE "Webarchive%") GROUP BY pagecats.cl_to ORDER BY my_count DESC, pagecats.cl_to LIMIT 2000000
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...