Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
This query is marked as a draft
This query has been published
by
HaeB
.
goal: find extraneous external links like in "[http://example.com/page foo] on [http://example.com/ example site]" - without parsing dumps, but with reasonably few false positives
Toggle Highlighting
SQL
USE enwiki_p; SET @language = 'en'; SELECT CONCAT('https://', @language , '.wikipedia.org/wiki/' ,page_title, '?action=edit&veswitched=1#:~:text', rooturl) AS page_edit_link, # try to highlight homepage URL in source wikitext rooturl, SUM(IF( url=rooturl, 1, 0) ) AS rootlinks, SUM(1) AS alllinks FROM ( SELECT page_title, el_to AS url, CONCAT(SUBSTRING_INDEX(el_to, '/', 3),'/') AS rooturl FROM externallinks, page WHERE el_from = page_id AND page_namespace = 0 # restrict query to a subset of articles for performance reasons: AND LEFT(UPPER(CONVERT(page_title USING utf8)), 1) < '2' ) AS pagelinks GROUP BY page_title, rooturl HAVING # page has links to both homepage and other pages on the same site: rootlinks > 0 AND rootlinks < alllinks # not interested in e.g. https://www.wikipedia.org/ on [[Wikipedia]]: AND LOCATE(REPLACE(LOWER(CONVERT(page_title USING utf8)), '_', ''), rooturl) = 0 AND rooturl != 'https://translate.google.com/' # frequently from {{expand (language)}} template LIMIT 20000
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...