Toggle navigation
Home
New Query
Recent Queries
Discuss
Database tables
Database names
MediaWiki
Wikibase
Replicas browser and optimizer
Login
History
Fork
This query is marked as a draft
This query has been published
by
Jura1
.
runtime > 5 min
Toggle Highlighting
SQL
# Find missing given names # # Uses items linking P27, but not to P735 # The "given name" is the first part of the label. This can be a given name, but not necessarily. # # For project, see https://www.wikidata.org/wiki/Wikidata:WikiProject_Names # # general database scheme https://upload.wikimedia.org/wikipedia/commons/f/f7/MediaWiki_1.24.1_database_schema.svg # doesn't include Wikidata tables :( # use wikidatawiki_p; SELECT CURRENT_DATE AS updated; SELECT SUBSTRING_INDEX(CONCAT(term_text, ' '), ' ', 1) AS label, COUNT(*) As freq, ROUND(COUNT(*)*2.5,-1) As freqEst, term_language As lang, CONCAT( '#[[Special:Search/',SUBSTRING_INDEX(CONCAT(term_text, ' '), ' ', 1), '|', SUBSTRING_INDEX(CONCAT(term_text, ' '), ' ', 1), ']]: ', COUNT(*) ) As simplelist FROM pagelinks As pl1, wb_terms, wb_entity_per_page LEFT JOIN ( SELECT pl_from FROM pagelinks WHERE pl_title = 'P735' AND pl_namespace = 120 ) As allitemswithP735 ON epp_page_id = allitemswithP735.pl_from WHERE allitemswithP735.pl_from IS NULL AND pl1.pl_title = 'P21' AND pl1.pl_namespace = 120 # 0 for items, 120 for properties AND pl1.pl_from = epp_page_id AND epp_entity_type = 'item' AND term_entity_id = epp_entity_id AND term_type = 'label' AND term_entity_type = 'item' AND term_language = 'en' # Big wikis: 'en','cs','da','de','es','et','fi','fr','it','nl','pl','pt','sk','sv','tr' #Filter some false positives: #Two-letter names/abbr.: DJ, Li, Wu, Yu, El, Oh, Ma AND term_text not RLike '^.. ' # Asian family names, etc AND term_text not RLike '^(Kim|Lee|Ahn|Chan|Chen|Chung|Cho|Choi|Dee|Han|Hang|Huang|Hong|Hwang|Jang|Jeong|Jin|Jung|Kang|Len|Lim|Lin|Liu|Mao|Moon|Park|Rui|Seo|Shin|Song|Sun|Wang|Yang|Yū|Yoo|Yoon|Zhang|Zhao|Zhou) ' # Japanese names, etc AND term_text not RLike '^(Kōji|Koji|Yūki|Yuki|Minoru|Mai|Masaki|Jun|Yoshiaki|Yōko) ' # prefix/titles AND term_text Not Like 'Category:%' AND term_text Not Like 'The %' AND term_text Not Like 'Big %' AND term_text Not Like 'Sir %' AND term_text Not Like 'Mr. %' AND term_text Not Like 'Miss %' AND term_text Not Like 'Madame %' AND term_text Not Like 'Mademoiselle %' AND term_text Not Like 'Ibn %' AND term_text Not Like 'Abu %' AND term_text Not Like 'Duke %' AND term_text Not Like 'General %' AND term_text Not Like 'Baron %' AND term_text Not Like 'Master %' AND term_text Not Like 'Maestro %' AND term_text Not Like 'Maître %' AND term_text Not Like 'Meister %' AND term_text Not Like 'Prince %' AND term_text Not Like 'Princess %' AND term_text Not Like 'Junior %' AND term_text Not Like 'King %' AND term_text Not Like 'Lady %' AND term_text Not Like 'Lord %' AND term_text Not Like 'Little %' AND term_text Not Like 'Said %' AND term_text Not Like 'Saint %' AND term_text Not Like 'St. %' AND term_text Not Like 'Sultan %' AND term_text Not Like 'Emperador %' # es AND term_text Not Like 'Emperor %' AND term_text Not Like 'Empress %' AND term_text Not Like 'Patriarch %' AND term_text Not Like 'Chief %' AND term_text Not Like 'Dr. %' GROUP BY SUBSTRING_INDEX(CONCAT(term_text, ' '), ' ', 1) # filter less frequent ones: HAVING COUNT(*)>24
By running queries you agree to the
Cloud Services Terms of Use
and you irrevocably agree to release your SQL under
CC0 License
.
Submit Query
Stop Query
All SQL code is licensed under
CC0 License
.
Checking query status...