Search code examples
sparqlrdfsemantic-web

boundary for arbitrary property path in SPARQL 1.1


Is it possible to bound the length of property path? For example getting all the triples with lengths that are between (m,n) or all that are not between this range? For instance, how could this be done with the following query?

select ?x ?y
where {?x p* ?y}

Solution

  • Some endpoints support this directly

    Some SPARQL engines support a method for doing this directly, with a regular-expression-like syntax. E.g.,

    ?s :p{n,m} ?o
    

    would be a path with a length between n and m. That syntax is described in SPARQL 1.1 Property Paths: W3C Working Draft 26 January 2010. There is also support for exact lengths, minimum lengths, and maximum lengths. For better or for worse, that syntax didn't make it into the final SPARQL 1.1 standard. Some SPARQL endpoints will still accept it though, so it's worth trying.

    A general workaround

    But there is a workaround. The idea is to split the candidate path into two parts. By checking how many ways it can be split into two parts, you can find the length of the path. That is, you do something like this to, for instance, find ?s and ?p where they are joined by a path of length ten:

    select ?s ?o {
      ?s :p* ?mid .
      ?mid :p* ?o .
    }
    group by ?s ?o
    having (count(?mid) = 10)
    

    Be sure to check the actual counts if you use this approach. It's easy to get an off-by-one (or -two) error depending on how you want to calculate length. There are a few options (whether to count the properties or the nodes, whether to count the endpoints or not, etc.), so a little bit of experimentation is worth while.

    References and Examples

    For some more examples of how you can use this pattern, have a look at: