Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
This query is marked as a draft
This query has been published
by
Robertsky
.
Toggle Highlighting
SQL
-- Identify potential cut-and-paste move candidates (limited to 10 for testing) WITH ArticlePairs AS ( SELECT r1.rev_page AS source_page, -- Page ID for source article r2.rev_page AS destination_page, -- Page ID for destination article r1.rev_id AS source_rev_id, -- Source revision ID r2.rev_id AS destination_rev_id, -- Destination revision ID r1.rev_timestamp AS source_timestamp, -- Timestamp of the source revision r2.rev_timestamp AS destination_timestamp, -- Timestamp of the destination revision t1.old_text AS source_content, -- Content of the source revision from text table t2.old_text AS destination_content, -- Content of the destination revision SIMILARITY(t1.old_text, t2.old_text) AS diff_score -- Similarity between source and destination content FROM revision r1 JOIN revision r2 ON r1.rev_page != r2.rev_page -- Compare revisions from different pages JOIN text t1 ON r1.rev_text_id = t1.old_id -- Get the text content for the source revision JOIN text t2 ON r2.rev_text_id = t2.old_id -- Get the text content for the destination revision WHERE SIMILARITY(t1.old_text, t2.old_text) > 0.85 -- Set a threshold for similarity AND TIMESTAMPDIFF(SECOND, STR_TO_DATE(r1.rev_timestamp, '%Y%m%d%H%i%s'), STR_TO_DATE(r2.rev_timestamp, '%Y%m%d%H%i%s')) < 3600 -- Within 1 hour ) SELECT ROW_NUMBER() OVER () AS idx, -- Index for the report p1.page_title AS source, -- Source article name from the page table ap.source_rev_id AS PreID, -- Source revision ID DATE_FORMAT(STR_TO_DATE(ap.source_timestamp, '%Y%m%d%H%i%s'), '%Y/%m/%d %H:%i:%s') AS Predate, -- Source timestamp formatted p2.page_title AS destination, -- Destination article name from the page table ap.destination_rev_id AS PostID, -- Destination revision ID DATE_FORMAT(STR_TO_DATE(ap.destination_timestamp, '%Y%m%d%H%i%s'), '%Y/%m/%d %H:%i:%s') AS Postdate, -- Destination timestamp formatted ap.diff_score AS DiffScore -- Difference score between revisions FROM ArticlePairs ap JOIN page p1 ON ap.source_page = p1.page_id -- Get the page name for the source article JOIN page p2 ON ap.destination_page = p2.page_id -- Get the page name for the destination article ORDER BY ap.source_timestamp DESC LIMIT 10; -- Limit the results to 10 pairs for testing
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...