I need to retrieve triples for entities which have a transitive relationship, but I only want as subjects the entity at the end of the transitivity chain.
For the following example:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<:a0> rdfs:label "a0" ;
<:has_parent> <:a1> .
<:b0> rdfs:label "b0" ;
<:has_parent> <:b1> .
<:a1> rdfs:label "a1" ;
<:has_parent> <:a2> .
<:a2> rdfs:label "a2" ;
<:has_parent> <:a3> .
<:a3> rdfs:label "a3" .
<:b1> rdfs:label "b1" ;
<:has_parent> <:b2> .
<:b2> rdfs:label "b2" .
When I run the following SPARQL query (using rdflib-5.0.0):
SELECT ?ancestor ?descendant
WHERE
{
?descendant <:has_parent>+ ?ancestor .
}
ORDER BY ?ancestor
I get:
:a1 is_ancestor_of :a0
:a2 is_ancestor_of :a0
:a2 is_ancestor_of :a1
:a3 is_ancestor_of :a2
:a3 is_ancestor_of :a0
:a3 is_ancestor_of :a1
:b1 is_ancestor_of :b0
:b2 is_ancestor_of :b1
:b2 is_ancestor_of :b0
But what I'd like to get is:
:a3 is_ancestor_of :a2
:a3 is_ancestor_of :a0
:a3 is_ancestor_of :a1
:b2 is_ancestor_of :b1
:b2 is_ancestor_of :b0
That is, only the "oldest ancestors" of a chain as subjects, and all descendants as objects. Put another way, I don't want any descendant as a subject.
I know I'm missing a FILTER, or a FILTER NOT EXISTS, or an additional SELECT-WHERE but all my attempts have currently returned empty tables (i.e. I negate all selected triples).
The closest question I found was this one, though I was unable to properly implement the self-selected answer, or the comment on the question.
I appreciate your help. Thanks.
For me, it always helps to try and formulate the query in words, first. What you want is to say: "only give me those ancestors that themselves do not have any further ancestors".
To formulate this in SPARQL, use a FILTER NOT EXISTS
constraint, something like this:
FILTER NOT EXISTS { ?ancestor <:has_parent> [] }
The []
bit here is an anonymous variable, basically you're saying "If an ancestor has any parent, they should not be returned".