Search code examples
sparqlrdflib

SPARQL: FILTER out subjects which appear as objects in any other SELECTed triple


I need to retrieve triples for entities which have a transitive relationship, but I only want as subjects the entity at the end of the transitivity chain.

For the following example:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<:a0> rdfs:label "a0" ;
    <:has_parent> <:a1> .

<:b0> rdfs:label "b0" ;
    <:has_parent> <:b1> .

<:a1> rdfs:label "a1" ;
    <:has_parent> <:a2> .

<:a2> rdfs:label "a2" ;
    <:has_parent> <:a3> .

<:a3> rdfs:label "a3" .

<:b1> rdfs:label "b1" ;
    <:has_parent> <:b2> .

<:b2> rdfs:label "b2" .

When I run the following SPARQL query (using rdflib-5.0.0):

SELECT ?ancestor ?descendant
WHERE 
{
   ?descendant <:has_parent>+ ?ancestor .
}
ORDER BY ?ancestor

I get:

:a1 is_ancestor_of  :a0
:a2 is_ancestor_of  :a0
:a2 is_ancestor_of  :a1
:a3 is_ancestor_of  :a2
:a3 is_ancestor_of  :a0
:a3 is_ancestor_of  :a1
:b1 is_ancestor_of  :b0
:b2 is_ancestor_of  :b1
:b2 is_ancestor_of  :b0

But what I'd like to get is:

:a3 is_ancestor_of  :a2
:a3 is_ancestor_of  :a0
:a3 is_ancestor_of  :a1
:b2 is_ancestor_of  :b1
:b2 is_ancestor_of  :b0

That is, only the "oldest ancestors" of a chain as subjects, and all descendants as objects. Put another way, I don't want any descendant as a subject.

I know I'm missing a FILTER, or a FILTER NOT EXISTS, or an additional SELECT-WHERE but all my attempts have currently returned empty tables (i.e. I negate all selected triples).

The closest question I found was this one, though I was unable to properly implement the self-selected answer, or the comment on the question.

I appreciate your help. Thanks.


Solution

  • For me, it always helps to try and formulate the query in words, first. What you want is to say: "only give me those ancestors that themselves do not have any further ancestors".

    To formulate this in SPARQL, use a FILTER NOT EXISTS constraint, something like this:

    FILTER NOT EXISTS { ?ancestor <:has_parent> [] }
    

    The [] bit here is an anonymous variable, basically you're saying "If an ancestor has any parent, they should not be returned".