Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
Fork of
Top sites used in external links (bswiki, main namespace) v1
by
MarioGom
This query is marked as a draft
This query has been published
by
Srđan
.
Toggle Highlighting
SQL
USE bswiki_p; SELECT main_domain, COUNT(*) AS total_count FROM ( SELECT CASE WHEN domain LIKE '%.rs.ba' THEN SUBSTRING_INDEX(domain, '.', -3) WHEN domain LIKE '%.gouv.fr' THEN SUBSTRING_INDEX(domain, '.', -3) WHEN domain LIKE '%.govt.nz' THEN SUBSTRING_INDEX(domain, '.', -3) WHEN domain REGEXP '^.*[-\\w]{4,}\\.[[:alpha:]]{2,}$' THEN SUBSTRING_INDEX(domain, '.', -2) WHEN domain REGEXP '^.*\\.(ac|edu|org|or|com|co|net|gov|gob|go)\\.[[:alpha:]]{2,}$' THEN SUBSTRING_INDEX(domain, '.', -3) WHEN domain REGEXP '.*\\.[[:alpha:]]{2}\\.(us|ca|za|it)$' THEN SUBSTRING_INDEX(domain, '.', -3) ELSE SUBSTRING_INDEX(domain, '.', -2) END AS main_domain FROM ( SELECT REGEXP_REPLACE(LOWER(CONVERT(SUBSTRING(el_to FROM 1 FOR 255) USING utf8)), '^(?:[^:/]+:)?//((?:[-_0-9a-z]+\\.)+[a-z]+)(?:[/:?].*)?$', '\\1') AS domain FROM page, externallinks WHERE el_from = page_id -- Only links from main namespace articles. AND page_namespace = 0 -- Get only fairly well-formed URLs and exclude IPv4. Only http, https, implicit https (//) and ftp. AND SUBSTRING(el_to FROM 1 FOR 40) REGEXP '(?i)^(?:https?:|ftp:)?//(?:[-_0-9a-z]+\\.)+[a-z]+(?:[/:?].*)?$' ) AS domains ) AS main_domains GROUP BY main_domain HAVING total_count >= 10 ORDER BY total_count DESC LIMIT 1000;
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...