Search code examples
sparqlrdfsemantic-webrdfstriplestore

Order of expressions in a SPARQL query


Is there any difference between the tow queries below?

select distinct ?i 
where{
    ?i rdf:type <http://foo/bar#A>. 
    FILTER EXISTS {
        ?i <http://foo/bar#hasB> ?b.
        ?b rdf:type <http://foo/bar#B1>.
    }            
}


select distinct ?i 
    where{
        FILTER EXISTS {
            ?i <http://foo/bar#hasB> ?b.
            ?b rdf:type <http://foo/bar#B1>.
        }
        ?i rdf:type <http://foo/bar#A>.             
    }

There are differences regarding performance or results?


Solution

  • First, you do not need FILTER EXISTS. You can rewrite your query with basic graph pattern (a set of regular triple patterns). But let's suppose you are using FILTER NOT EXISTS or something like.

    Results

    In general, order matters.

    However, top-down evaluation semantics plays role mostly in case of OPTIONAL, and that is not your case. Thus, results should be the same.

    Top-down evaluation semantics can be overridden by bottom-up evaluation semantics. Fortunately, bottom-up semantics doesn't prescribe to evaluate FILTER logically first though it is possible in case of FILTER EXISTS and FILTER NOT EXISTS.

    SPARQL Algebra representation is the same for both queries:

    (prefix ((rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
             (foobar: <http://foo/bar#>))
      (distinct
        (project (?i)
          (filter (exists
                     (bgp
                       (triple ?i foobar:B ?b)
                       (triple ?b rdf:type foobar:B1)
                     ))
            (bgp (triple ?i rdf:type foobar:A))))))
    

    Performance

    Naively following top-down semantics, an engine should evaluate ?i a foobar:A first.

    • You are lucky, if there exists only one binding for ?i.
    • You are not so lucky, if there exist millions of bindings for ?i whereas subpattern is much more selective.

    Fortunately, optimizers try to reorder patterns depending on their selectivity. However, predictions can be erroneous.

    By the way, the rdf:type predicate is said to be a performance killer in Virtuoso.

    Results vs Performance

    Results can be different, if an endpoint has a query execution time limit and flushes partial results when timeout is reached: an example.