Search code examples
regexdatabaseunicodesparqlwikidata

Wikidata query to get country synonyms but not flag symbols


I am using the following query to get a list of ISO countries with their details:

SELECT ?country ?countryLabel ?code  ?wikipedia ?countryAltLabel
WHERE
{
  ?country wdt:P297 ?code .
  OPTIONAL {     
                    ?wikipedia schema:about ?country .
                    ?wikipedia schema:isPartOf <https://en.wikipedia.org/>.
                    } .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}

The results I am getting look like this:enter image description here

Is there any way to remove flag symbols from the list of CountryAltLabels?


Solution

  • Three points:

    1. Those flags are pairs of symbols from U+1F1E6—U+1F1FF, as suggested by Tom Morris.
    2. The label service should be used in the manual mode.
    3. A particular query execution order should be forced (via hint:Query hint:optimizer "None").
    SELECT ?country ?countryLabel ?code ?wikipedia ?countryAltLabel ?alt {
      hint:Query hint:optimizer "None" . 
      ?country wdt:P297 ?code
      OPTIONAL { ?wikipedia schema:about ?country ; schema:isPartOf <https://en.wikipedia.org/> }
      SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en".
        ?country skos:altLabel ?countryAltLabel ; rdfs:label ?countryLabel         
      } 
      BIND ( REPLACE(?countryAltLabel, "[🇦-🇿]{2}, |, ?[🇦-🇿]{2}", "") AS ?alt )
    }
    

    https://w.wiki/Xcv

    Note that these 🇦 and 🇿 are not regular capital letters. One could write [\\x{1f1e6}-\\x{1f1ff}] instead.