Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
This query is marked as a draft
This query has been published
by
Cryptic
.
Counts the number of non-hidden categories in a sample of non-redirect mainspace pages, split by whether they're likely to be biographies. The sole criterion for "biographical" is whether the page is in a category whose title contains "birth", case-sensitive. (One of the implications is that completely uncategorized pages are assumed not to be biographies.) The sampling is based on page.page_random, a value between 0 and 1 assigned at page creation. See the caveat at [[mw:Page table#page_random]], but I don't think it's significant. Resultset 1 is a summary and basic stats; resultset 2 the raw data. For [[WP:RAQ#What is the distribution of the number of categories for biographical and non-biographical articles?]] circa 27 October 2019.
Toggle Highlighting
SQL
-- Resultset 1: summary SELECT is_biographical, COUNT(*) AS number_of_articles, AVG(number_of_categories), STDDEV(number_of_categories), MIN(number_of_categories), MAX(number_of_categories) FROM ( SELECT artpage.page_title AS article, (EXISTS (SELECT 1 FROM categorylinks isbio WHERE isbio.cl_to LIKE '%birth%' AND isbio.cl_from=artpage.page_id)) AS is_biographical, artpage.page_len, (SELECT COUNT(*) FROM categorylinks countcats JOIN page catpage ON catpage.page_namespace=14 AND catpage.page_title=countcats.cl_to LEFT JOIN page_props ON pp_page=catpage.page_id AND pp_propname='hiddencat' WHERE countcats.cl_from=artpage.page_id AND pp_propname IS NULL) AS number_of_categories FROM page artpage WHERE artpage.page_id IN (SELECT page_id FROM page WHERE page_namespace=0 AND page_is_redirect=0 AND page_random BETWEEN 0.5 AND 0.51) GROUP BY article, is_biographical, artpage.page_len ) sq GROUP BY is_biographical; -- Resultset 2: data - same as the large subquery above SELECT artpage.page_title AS article, (EXISTS (SELECT 1 FROM categorylinks isbio WHERE isbio.cl_to LIKE '%birth%' AND isbio.cl_from=artpage.page_id)) AS is_biographical, artpage.page_len, (SELECT COUNT(*) FROM categorylinks countcats JOIN page catpage ON catpage.page_namespace=14 AND catpage.page_title=countcats.cl_to LEFT JOIN page_props ON pp_page=catpage.page_id AND pp_propname='hiddencat' WHERE countcats.cl_from=artpage.page_id AND pp_propname IS NULL) AS number_of_categories FROM page artpage WHERE artpage.page_id IN (SELECT page_id FROM page WHERE page_namespace=0 AND page_is_redirect=0 AND page_random BETWEEN 0.5 AND 0.51) GROUP BY article, is_biographical, artpage.page_len;
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...