Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
This query is marked as a draft
This query has been published
by
HaeB
.
goal: find extraneous external links like in "[http://example.com/page foo] on [http://example.com/ example site]" - without parsing dumps, but with reasonably few false positives
Toggle Highlighting
SQL
USE barwiki_p; SET @language = 'bar'; SELECT CONCAT('https://', @language','.wikipedia.org/wiki/',page_title) AS page_url, rooturl, SUM(IF( url=rooturl, 1, 0) ) AS rootlinks, SUM(1) AS alllinks FROM ( SELECT page_title, el_to AS url, CONCAT(SUBSTRING_INDEX(el_to, '/', 3),'/') AS rooturl FROM externallinks, page WHERE el_from = page_id AND page_namespace = 0) AS pagelinks GROUP BY page_title, rooturl HAVING rootlinks > 0 AND rootlinks < alllinks AND LOCATE(REPLACE(LOWER(CONVERT(page_title USING utf8)), '_', ''), rooturl) = 0 LIMIT 200
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...