Search code examples
sparqluniongroup-concatwikidata

SPARQL: UNION drops values, OPTIONAL needs separate variables


How can I get all the labels and aliases for a given country, in both English and the official language(s) of this country, as a single column, from wikidata?

I initially tried the following but UNION seems to drop the labels in the official language:

Attempt #1:

SELECT
  ?country_iso
  (GROUP_CONCAT(DISTINCT(?label) ; separator = ", ") AS ?labels)
WHERE
{
  VALUES ?country { wd:Q878 } # FOR TESTING PURPOSES, TO REMOVE ONCE ISSUE RESOLVED.
                              # country  sovereign state  unincorporated territory
  VALUES ?any_kind_of_country { wd:Q6256 wd:Q3624078      wd:Q783733 }
  ?country wdt:P31 ?any_kind_of_country . hint:Prior hint:runFirst true .
  ?country wdt:P297 ?country_iso ; # ISO 3166-1 alpha-2 code
           wdt:P37 ?official_lang .
  ?official_lang wdt:P424 ?wm_lang_code .
  { ?country rdfs:label ?label . FILTER(LANG(?label) = "en") }
  UNION
  { ?country rdfs:label ?label . FILTER(LANG(?label) = ?wm_lang_code) }
  UNION
  { ?country skos:altLabel ?label . FILTER(LANG(?label) = "en") }
  UNION
  { ?country skos:altLabel ?label . FILTER(LANG(?label) = ?wm_lang_code) }
}
GROUP BY ?country_iso
ORDER BY ?country_iso
| country_iso | labels                                                                                                           |
|-------------|------------------------------------------------------------------------------------------------------------------|
| AE          | United Arab Emirates, ae, 🇦🇪, Emirates, the Emirates, the U.A.E., the UAE, the United Arab Emirates, U.A.E., UAE |

If I use two separate variables ?label and ?altLabel and OPTIONAL, then I seem to get all the labels and aliases, but:

  • a. I'm not convinced || is right there, although it seems to produce the right results,
  • b. two columns (?labels and ?altLabels) isn't ideal in my case. (And if I only use one, then only rdfs:label gets returned.)

Attempt #2

SELECT
  ?country_iso
  (GROUP_CONCAT(DISTINCT(?label) ; separator = ", ") AS ?labels)
  (GROUP_CONCAT(DISTINCT(?altLabel) ; separator = ", ") AS ?altLabels)
WHERE
{
  VALUES ?country { wd:Q878 }
                              # country  sovereign state  unincorporated territory
  VALUES ?any_kind_of_country { wd:Q6256 wd:Q3624078      wd:Q783733 }
  ?country wdt:P31 ?any_kind_of_country . hint:Prior hint:runFirst true .
  ?country wdt:P297 ?country_iso ; # ISO 3166-1 alpha-2 code
           wdt:P37 ?official_lang .
  ?official_lang wdt:P424 ?wm_lang_code .
  OPTIONAL { ?country rdfs:label ?label . FILTER(LANG(?label) = "en" || LANG(?label) = ?wm_lang_code) }
  OPTIONAL { ?country skos:altLabel ?altLabel . FILTER(LANG(?altLabel) = "en" || LANG(?altLabel) = ?wm_lang_code) }
}
GROUP BY ?country_iso
ORDER BY ?country_iso
| country_iso | labels                                         | altLabels                                                                                                                           |
|-------------|------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
| AE          | United Arab Emirates, الإمارات العربية المتحدة | الإمارات, دولة الإمارات العربية المتحدة, ae, 🇦🇪, Emirates, the Emirates, the U.A.E., the UAE, the United Arab Emirates, U.A.E., UAE |

What am I missing? 🤔 Thank you very much in advance for your help! 🙏🏻😊

Disclaimer: relatively new to SPARQL (or rather, very rusty).


Solution

  • Thanks to @UninformedUser (see comments) I was able to come up with this query, yielding 206 results:

    SELECT
      ?country_iso
      (GROUP_CONCAT(DISTINCT(?label) ; separator = " | ") AS ?labels)
    WHERE
    {
                                  # country  sovereign state  unincorporated territory
      VALUES ?any_kind_of_country { wd:Q6256 wd:Q3624078      wd:Q783733 }
      ?country wdt:P31 ?any_kind_of_country . hint:Prior hint:runFirst true .
      ?country wdt:P297 ?country_iso ; # ISO 3166-1 alpha-2 code
               wdt:P37 ?official_lang .
      ?official_lang wdt:P424 ?wm_lang_code .
      ?country rdfs:label|skos:altLabel ?label . FILTER(LANG(?label) = "en" || LANG(?label) = ?wm_lang_code)
    }
    GROUP BY ?country_iso
    ORDER BY ?country_iso
    

    P.S.: An extension of this work, also fetching emoji flags is that query.