Search code examples
sparqlwikidata

Wikidata SPARQL queries returning different results after filtering for English labels


My understanding of Wikidata SPARQL queries is that you can filter results for English labels in two ways.

  1. Adding SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } to invoke a label service; or
  2. Adding ?thing rdfs:label ?thingLabel FILTER (lang(?thingLabel) = "en") for every output label.

I am running a query where I'm trying to get all properties of an entity in English. I followed a Stackoverflow post and came up with two queries.

Query 1: Running this query takes returns 47 results.

SELECT ?itemLabel ?propLabel ?statement_property_objLabel
WHERE {

    VALUES (?item) {(wd:Q24)}

    ?item ?property [?statement_property ?statement_property_obj] .
    ?prop wikibase:claim ?property.
    ?prop wikibase:statementProperty ?statement_property.

    # Call label service.
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }

} ORDER BY ?propLabel

Query 2: Running this query returns 35 results.

SELECT ?itemLabel ?propLabel ?statement_property_objLabel
WHERE {

    VALUES (?item) {(wd:Q24)}

    ?item ?property [?statement_property ?statement_property_obj] .
    ?prop wikibase:claim ?property.
    ?prop wikibase:statementProperty ?statement_property.

    # Call label service for each label.
    ?item rdfs:label ?itemLabel FILTER (lang(?itemLabel) = "en") .
    ?statement_property_obj rdfs:label ?statement_property_objLabel FILTER (lang(?statement_property_objLabel) = "en") .
    ?prop rdfs:label ?propLabel FILTER (lang(?propLabel) = "en") .

} ORDER BY ?propLabel

Why is the second query returning fewer rows? Thanks for any help.


Solution

  • I think the cause is that the wikibase:label service returns label results for any value of ?statement_property_obj, even if that value has no actual rdfs:label defined (it appears to just return the actual value of ?statement_property_obj itself).

    As an example, see the very first result in query 1, where ?statement_property_objLabel is bound to topic/Jack_Bauer. This is not the value of an actual rdfs:label property in the data, just a 'fallback' value that the label service provides. So query 2, which explicitly queries for rdfs:label attributes, won't return this (and similar) results.