Search code examples
sparqlwikidata

SPARQL: Projecting on a non-existing variable


Try this query with and without the limit clause at the end on Wikidata endpoint.

With LIMIT clause here, and without LIMIT clause here here.

Now see the difference... I think that the reason for this is the ?duration variable in the projection of the first subquery, which indeed has no bindings and not in the domain of the solution mapping. Now I think here there is definitely a bug in Blazegraph. But the question anyways is: if we project on a variable that doesn't exist in the domain of the solution, and then use the variable for joining (as with ?duration in the example), what should the behaviour be? Ignore the variable or treat it as unbound variable?

SELECT   ?film ?duration
WHERE
{         
    {
      select ?film ?duration
      where
         {?film   <http://www.wikidata.org/prop/direct/P31>  <http://www.wikidata.org/entity/Q11424>.}
    }
  
    {
      select ?film ?duration     
      where
         {?film   <http://www.wikidata.org/prop/direct/P2047>  ?duration .}
    }      
}
#limit 1000 

Solution

  • Is there a workaround that makes LIMIT work?

    Yes. If you remove ?duration from the SELECT clause of the first subquery, then the query works with LIMIT.

    Is there a bug in Blazegraph?

    Yes. Removing ?duration should not change the result of the query, but it obviously changes the result if LIMIT is present.

    We know that ?duration is unbound in all solutions of the first subquery, regardless of whether we removed ?duration from the SELECT clause. So the only difference between the two queries is whether the variable is in scope or not. And the definition of SPARQL's join operation does not refer to variable scope at all. It just depends on the variables that are actually bound in solutions. So, changing what variables are in scope is not supposed to change the result of the query.

    If we project on a variable that doesn't exist in the domain of the solution, and then use the variable for joining (as with ?duration in the example), what should the behaviour be?

    The variable should be treated as always unbound, but in scope. This means:

    • bound(?var) should always be false
    • SELECT * should include a ?var column that is always empty
    • SELECT ... ("xxx" AS ?var) should result in a syntax error because ?var is already in scope