Search code examples
pythonsparqlwikipedia

How do I find item label or plaintext name from URI or QID in SPARQL?


I have a SPARQL script for finding the birthday and death day of a person, and I want to also get the profession of the person.

This is the query I was using to get the DOB and death date:

SELECT distinct ?item ?itemLabel (SAMPLE(?DOB) as ?DOB) (SAMPLE(?RIP) as ?RIP) WHERE {
  ?item wdt:P31 wd:Q5.
  ?item ?label "David Bowie"@en.  
  ?article schema:about ?item .
  ?article schema:inLanguage "en" .
  ?article schema:isPartOf <https://en.wikipedia.org/>.  
  OPTIONAL{?item wdt:P569 ?DOB .}            # P569 : Date of birth
  OPTIONAL{?item wdt:P570 ?RIP .}            # P570 : Date of death
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }    
}
GROUP BY ?item ?itemLabel

And to get the profession I made this modification:

SELECT distinct ?item ?itemLabel (SAMPLE(?DOB) as ?DOB) (SAMPLE(?RIP) as ?RIP) (SAMPLE(?profession) as ?profession)
WHERE {
  ?item wdt:P31 wd:Q5.
  ?item ?label "David Bowie"@en.  
  ?article schema:about ?item.
  ?article schema:inLanguage "en" .
  ?article schema:isPartOf <https://en.wikipedia.org/>.  
  OPTIONAL{?item wdt:P569 ?DOB .}            # P569 : Date of birth
  OPTIONAL{?item wdt:P570 ?RIP .}            # P570 : Date of death
  OPTIONAL{?item wdt:P106 ?profession .}     # P106 : profession
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }    
}
GROUP BY ?item ?itemLabel

This query will return the birthday and deathday if valid, but for the profession I get wd:Q33999 in the online query service and in my python script where I'm executing the query I get 'profession': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q33999'}.

This wd:Q33999 links to actor, but I just want the text - 'actor'. Can I select the label associated with wd:33999? Or execute some sort of subquery to find the profession in plain text?


Solution

  • You can extend this part:

    OPTIONAL{?item wdt:P106 ?profession .}
    

    to fetch more related data, like a label:

    OPTIONAL{
      ?item wdt:P106 ?professionId .
      ?professionId rdfs:label ?profession
    }
    

    Remember to replace the rdfs:label with a label predicate that exists in your data (if different).

    In you case I think you'll also need to add some filtering to include only English label:

    OPTIONAL {
      ...
      FILTER (lang(?profession) = 'en')
    }