Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
This query is marked as a draft
This query has been published
by
Cryptic
.
Counts of the first edit by users who registered in 2023, by namespace, whether that edit created a page, and whether the page is currently deleted. Edge cases where this is inaccurate: * Namespace is of where the edit is *now*, so a first edit made to a draft that was later moved to mainspace will show up as namespace 0. That's fixable for page creations (by crossreferencing the page creation log), and for edits made within the last 30 days (by crossreferencing recentchanges), but not for older edits to already-existing pages. Since at least half the data would still be bad, I've left it all equally bad. * Revdeleted and oversighted edits. I believe these don't show up at all with the revision_userindex and archive_userindex views (they certainly won't if it's the user's username that's been oversighted), so this is an examination of users' first edits that haven't been revdeleted/oversighted. Normally-deleted edits *are* visible. * Edits that have been imported. I find the first live edit by sorting on rev_id; that doesn't have the same ordering as rev_timestamp for imported edits. This case is vanishingly rare, though; and picking out the edit with the earliest timestamp, like I do for deleted edits, is significantly slower. Probably would've been ok, though. That anomalous-looking first edit to the MediaWiki namespace was by a WMF account; the page is [[MediaWiki:Campaigns-event-discovery-survey-question]]. For [[WP:RAQ#First edit]] circa 29 June 2024.
Toggle Highlighting
SQL
WITH newusers AS ( -- parsing the user creation log in 2023 for user_id, which is buried in -- log_params, takes almost exactly as long as the full user table scan; -- it makes sense to do that for shorter time periods, though SELECT user.*, actor_id AS user_actor FROM user JOIN actor_user ON actor_user = user_id WHERE user_registration LIKE '2023%' AND user_editcount >= 1 ), first_live_edit_id AS ( -- rev_id isn't always in the same order as rev_timestamp (particularly for imported edits), but the difference is negligible SELECT user_id AS flei_user, MIN(rev_id) AS flei_rev FROM revision_userindex JOIN newusers ON rev_actor = user_actor GROUP BY user_id ), first_live_edit AS ( SELECT flei_user AS fle_user, rev_timestamp AS fle_timestamp, page_namespace AS fle_namespace, !rev_parent_id AS fle_new FROM revision JOIN first_live_edit_id ON rev_id = flei_rev JOIN page ON page_id = rev_page ), first_deleted_edit AS ( -- ar_id, on the other hand, is often in a very different ordering from -- ar_timestamp - the ids are assigned when the page is deleted. So, the -- slow dependent subquery. -- If the user managed to make two edits with the same timestamp and both have -- been deleted, it's impossible to figure out which was earlier, so we just -- pick one. (Even if both those edits are page creations, there's no -- guarantee that the order in the page creation log is the correct one.) SELECT user_id AS fde_user, ar_timestamp AS fde_timestamp, ar_namespace AS fde_namespace, !ar_parent_id AS fde_new FROM archive_userindex JOIN newusers ON ar_actor = user_actor WHERE ar_timestamp = (SELECT MIN(ar_timestamp) FROM archive_userindex AS a WHERE a.ar_actor = user_actor) GROUP BY user_id ), first_edit AS ( SELECT user_id, CASE WHEN fde_timestamp < fle_timestamp THEN fde_namespace ELSE fle_namespace END AS fe_namespace, CASE WHEN fde_timestamp < fle_timestamp THEN fde_new ELSE fle_new END AS fe_new -- mariadb doesn't have full outer join, so rather than futzing around with -- the direct workarounds, just join newusers too FROM newusers LEFT JOIN first_live_edit ON fle_user = user_id LEFT JOIN first_deleted_edit ON fde_user = user_id ) SELECT fe_namespace, fe_new, COUNT(*) FROM first_edit GROUP BY user_id;
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...