Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
Fork of
WikiProject to page category Xref, 10% scan, limit 2M rows, prefilter
by
The Anome
This query is marked as a draft
This query has been published
by
The Anome
.
Part of a plan to assign articles to WikiProjects using Naive Bayes. Still being worked on. Limited to just a few pages, for testing purposes. Sorting by all three result columns to make output more compressible later. Now uses page_random to filter articles, hopefully a better filter if indexed: Now going for 5%: Executed in 8710.72 seconds as of Fri, 06 Oct 2023 16:58:41 UTC. Resultset (9308466 rows) -- locks up the browser with too much result content Now limiting this to output first 1M rows: 5% scan: Executed in 1376.97 seconds as of Tue, 10 Oct 2023 09:43:13 UTC. Resultset (1000000 rows) Now doing 25% scan, limited to first 2M rows... Executed in 3512.16 seconds as of Wed, 11 Oct 2023 00:24:53 UTC. Resultset (2000000 rows)
Toggle Highlighting
SQL
SELECT pagecats.cl_to, talkcats.cl_to, -- REGEXP_EXTRACT(talkcats.cl_to, '(?WikiProject_|.*_importance_|.*_priority_|Unassessed)(.*)'), -- does not work COUNT(*) AS my_count FROM page AS article INNER JOIN page AS talk ON talk.page_title = article.page_title INNER JOIN categorylinks AS pagecats ON pagecats.cl_from = article.page_id INNER JOIN categorylinks AS talkcats ON talkcats.cl_from = talk.page_id WHERE article.page_random < 0.25 -- page_random is between 0 and 1, so this sets the probe size article.page_namespace = 0 AND talk.page_namespace = 1 AND article.page_is_redirect = 0 AND talk.page_is_redirect = 0 -- filter out redirect and disambig pages entirely AND NOT EXISTS (SELECT 1 from page WHERE pagecats.cl_to LIKE "%disambig%" OR pagecats.cl_to LIKE "%Disambig%" OR pagecats.cl_to LIKE "%redirect%" OR pagecats.cl_to LIKE "%Redirect%") AND ( talkcats.cl_to LIKE "%WikiProject_%" OR talkcats.cl_to LIKE "%-Class_%_articles" OR talkcats.cl_to LIKE "%-importance_%_articles" OR talkcats.cl_to LIKE "%-priority_%_articles" OR talkcats.cl_to LIKE "Unassessed_%_articles" ) AND NOT ( talkcats.cl_to LIKE "%vital%" OR talkcats.cl_to LIKE "%Version%" ) AND NOT (pagecats.cl_to LIKE "%Disambig%" OR pagecats.cl_to LIKE "%disambig%" OR pagecats.cl_to LIKE "%set_index%" OR pagecats.cl_to LIKE "Set_index%") AND NOT (talkcats.cl_to LIKE "%Disambig%" OR talkcats.cl_to LIKE "%disambig%") AND NOT (pagecats.cl_to LIKE "Short_description%" OR pagecats.cl_to LIKE "%_errors%" OR pagecats.cl_to LIKE "CS1_%" OR pagecats.cl_to LIKE "%short_description%" OR pagecats.cl_to LIKE "%articles%" OR pagecats.cl_to LIKE "Articles%" OR pagecats.cl_to LIKE "%pages%" OR pagecats.cl_to LIKE "%disputes%" OR pagecats.cl_to LIKE "Pages%" OR pagecats.cl_to LIKE "Use_dmy_date%" OR pagecats.cl_to LIKE "Use_mdy_date%" OR pagecats.cl_to LIKE "%Wikipedia%" OR pagecats.cl_to LIKE "%articles%" OR pagecats.cl_to LIKE "%Articles%" OR pagecats.cl_to LIKE "%Wikidata%" OR pagecats.cl_to = "Source_attribution" OR pagecats.cl_to LIKE "Webarchive%") GROUP BY pagecats.cl_to, talkcats.cl_to ORDER BY my_count DESC, pagecats.cl_to, talkcats.cl_to LIMIT 2000000
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...