Fork of
dewiki: Find not archived URLs where the same URL is archived on same or an other article.
by ⵓ
This query is marked as a draft
This query has been published
by ⵓ.
SQL
x
USE dewiki_p;
SELECT distinct farchivepage, archivefound, el_to
FROM externallinks as eur
INNER JOIN page as pr on eur.el_from=pr.page_id
INNER JOIN (SELECT pf.page_id, substring(euf.el_to,29+15) as f, max(euf.el_to) as archivefound, CONCAT('[[',GROUP_CONCAT(pf.page_title SEPARATOR ']], [['),']]') as farchivepage, max(euf.el_id) as m_id
FROM externallinks as euf INNER JOIN page as pf on euf.el_from=pf.page_id
WHERE el_index LIKE "https://org.archive.web./web/______________/%"
and pf.page_namespace=0
GROUP BY substring(euf.el_to,29+15),pf.page_id ) as filter on filter.f=el_to and filter.page_id=pr.page_id
WHERE pr.page_namespace=0
GROUP BY el_to
ORDER BY farchivepage DESC
LIMIT 15000
By running queries you agree to the Cloud Services Terms of Use and you irrevocably agree to release your SQL under CC0 License.
All SQL code is licensed under CC0 License.