Search code examples
sparqlwikidatasparqlwrapper

How to get property labels from Wikidata using SPARQL


I am using SPARQLWrapper to send SPARQL queries to Wikidata. At the moment I am trying to find all properties for an entity. Eg. with a simple tuple such as: wd:Q11663 ?a ?b. This in itself works, but I am trying to find human readable labels for the returned properties and entities.

Although SERVICE wikibase:label works using Wikidata's GUI interface, this does not work with SPARQLWrapper - which insists on returning identical values for a variable and its 'label'.

Querying on the property rdfs:label works for the entity (?b), but this approach does not work with the property (?a).

it would appear the property is being returned as a full URI such as http://www.wikidata.org/prop/direct/P1536 . Using the GUI I can successfully query wd:P1536 ?a ?b.. This works with SPARQLWrapper if I send it as a second query - but not in the first query.

Here is my code:

from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("http://query.wikidata.org/sparql")

sparql.setQuery("""
  SELECT ?a ?aLabel ?propLabel ?b ?bLabel
  WHERE
  {
    wd:Q11663 ?a ?b.

    # Doesn't work with SPARQLWrapper
    #SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
    #?prop wikibase:directClaim ?p

    # but this does (and is more portable)
    ?b rdfs:label ?bLabel. filter(lang(?bLabel) = "en").

    # doesn't work
    #?a rdfs:label ?aLabel. 

    # property code can be extracted successfully
    BIND(  strafter(str(?a), "prop/direct/") AS ?propLabel).
    #BIND( CONCAT("wd:", strafter(str(?a), "prop/direct/") ) AS ?propLabel).

    # No matches, even if I concat 'wd:' to ?propLabel
    ?propLabel rdfs:label ?aLabel
    # generic search for any properties also fails
    #?propLabel ?zz ?aLabel.
   }
 """)

# However, this returns a label for P1536 - which is one of wd:Q11663's properties
sparql.setQuery("""SELECT ?b WHERE
   {
      wd:P1536 rdfs:label ?b.
   }
""")

So how can I get the labels for the properties in one query (which should be more efficient)?

[aside: yes I'm a bit rough & ready with the EN filter - often dropping it if I'm not getting anything back]


Solution

  • I was having problems with two approaches - and the code above contains a mixture of both. Also, SPARQLWrapper isn't a problem here.

    The first approach using the wikibase Label service should be like this:

    SELECT ?a ?aLabel ?propLabel ?b ?bLabel
    WHERE
    {
      ?item rdfs:label "weather"@en.
      ?item ?a ?b.
    
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
      ?prop wikibase:directClaim ?a .
    }
    

    This code also includes a lookup from the label ('weather') to the query entity (?item).

    The SERVICE was working, but if there isn't an rdfs:label definition then it just returns the entity. The GUI and SPARQLWrapper (to the SPARQL endpoint) were simply returning the results in a different order - so it looked like I was seeing lots of 'failed' output (ie. entities and failed labels both being reported as the same).

    This became clear when I started adding an OPTIONAL clause to the approach below.

    The ?prop wikibase:directClaim ?a . line turns out to be pretty simple. Wikibase defines directClaim to map properties to entities. This then allows it to define tuples about properties (ie. a label). Many other ontologies just use the same identifiers.

    My second (more generic approach) is the approach you find in many of the books and online tutorials. The problem here is that wikibase's properties have the full URL in them, and I needed to convert them into an entity. I tried string manipulation but this produces a string literal - not an entity. The solution is to use directClaim again:

    ?prop wikibase:directClaim ?a .
    ?prop rdfs:label ?propLabel.  filter(lang(?propLabel) = "en").
    

    Note that this only returns a result if rdfs:label is defined. Adding an OPTIONAL will return results even if there is no label defined.