Search code examples
sparqlwdqs

Wikidata SPARQL returns too many results


This is the query that can be executed in https://query.wikidata.org, I am supposed to get only 17 values, but it returns 289 results (17 * 17 = 289). I want to get property value with its unit. I am specifying wdt:P2573 just to demonstrate the issue, in real application that property is a variable ?p.

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT * WHERE {
  wd:Q2 wdt:P2573 ?o.
  wd:Q2 rdfs:label ?entName.
  ?realAtt wikibase:directClaim wdt:P2573.
  ?realAtt rdfs:label ?attName.
  ?realAtt wikibase:propertyType ?wdDataType.

  OPTIONAL {
    ?realAtt wikibase:statementValue ?psv.
    ?realAtt wikibase:claim ?pp.
    wd:Q2 ?pp ?quantityNode.
    ?quantityNode ?psv ?wdv.
    ?wdv wikibase:quantityUnit ?unit.
    ?wdv wikibase:quantityAmount ?qAmount.
    BIND(?qAmount AS ?val)
  }
  BIND(COALESCE(?val, ?o) AS ?val)


  BIND(COALESCE(?unit, "") AS ?unit)
  FILTER(((LANG(?o)) = "en") || ((LANG(?o)) = "") || (!ISLITERAL(?o)))
  FILTER(((LANG(?attName)) = "en") || ((LANG(?attName)) = ""))
  FILTER(((LANG(?entName)) = "en") || ((LANG(?entName)) = ""))
}

Solution

  • Simple values of truthy statements are not automagically connected with value nodes (class diagram).

    Your MCVE should look like this:

    SELECT * WHERE {
      wd:Q2 wdt:P2573 ?o.
      OPTIONAL {
        wd:Q2 p:P2573/psv:P2573 ?wdv.
        ?wdv wikibase:quantityUnit ?unit.
        ?wdv wikibase:quantityAmount ?qAmount.
        # FILTER( ?unit != wd:Q199 )
      }
    }
    

    Try it!

    In the query above, the only joining condition of ?o and ?wdv is their relatedness to wd:Q2.
    Hence, you obtain cartesian product of ?o and ?wdv bindings (right, 17×17 = 289).

    The correct query should look like this:

    SELECT * WHERE {
      wd:Q2 p:P2573/psv:P2573 ?wdv.
      OPTIONAL {
        ?wdv wikibase:quantityUnit ?unit.
        ?wdv wikibase:quantityAmount ?qAmount.
      }
    }
    

    Try it!

    Update

    The above query works for quantities only. Obviously, it doesn't work for times or globe coordinates. Moreover, sometimes statements don't have full values at all. For example, statements with string objects have simple values only. One should get simple values from statements and then try to get additional info from full values:

    SELECT * {
      VALUES (?wd) {(wd:P2067)(wd:P1332)(wd:P1814)}
      ?wd wikibase:claim ?p;
          wikibase:statementProperty ?ps;
          wikibase:statementValue ?psv.
      wd:Q2 ?p ?wds.
      ?wds ?ps ?sv.
      OPTIONAL {
        ?wds ?psv ?wdv
        OPTIONAL {?wdv wikibase:quantityUnit ?unit.
        ?wdv wikibase:quantityAmount ?amount}
      }
    }
    

    Try it!

    Overall, statements characteristics can be quite diverse. It is not very convenient to represent all these data in a table format. That is one of the reasons why RDF exists.