Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
Fork of
Potentially extraneous homepage links (dewiki <= M)
by
HaeB
This query is marked as a draft
This query has been published
by
HaeB
.
goal: find extraneous external links like in "[http://example.com/page foo] on [http://example.com/ example site]" - without parsing dumps, but with reasonably few false positives
Toggle Highlighting
SQL
USE dewiki_p; SET @language = 'de'; # https://quarry.wmcloud.org/query/77235 : WITH externallinksforhumans AS ( SELECT el_from, REGEXP_REPLACE(el_to_domain_index, '^(.*?://)(?:([^.]+)\\.)([^.]+\\.)?([^.]+\\.)?([^.]+\\.)?([^.]+\\.)?([^.]+\\.)?([^.]+\\.)?([^.]+\\.)$', '\\1\\9\\8\\7\\6\\5\\4\\3\\2') AS rooturl, el_to_path FROM externallinks) SELECT CONCAT('https://', @language , '.wikipedia.org/wiki/' ,page_title, '?action=edit&veswitched=1#:~:text=', rooturl) AS page_edit_link, # try to highlight homepage URL in source wikitext rooturl, url, SUM(IF( url=rooturl, 1, 0) ) AS rootlinks, SUM(1) AS alllinks FROM ( SELECT page_title, CONCAT(rooturl,el_to_path) AS url, rooturl FROM externallinksforhumans, page WHERE el_from = page_id AND page_namespace = 0 # restrict query to a subset of articles for performance reasons: AND LEFT(page_title, 1) = 'N' ) AS pagelinks GROUP BY page_title, rooturl HAVING # page has links to both homepage and other pages on the same site: rootlinks > 0 AND rootlinks < alllinks LIMIT 200
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...