Search code examples
sparqlowlreasoning

Removing unwanted superclass answers in SPARQL


I have an OWL file that includes a taxonomic hierarchy that I want to write a query where the answer includes each individual and its immediate taxonomic parent. Here's an example (the full query is rather messier).

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http:://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <urn:ex:> .

:fido rdf:type :Dog .
:Dog rdfs:subClassOf :Mammal .
:Mammal rdfs:subClassOf :Vertebrate .
:Vertebrate rdfs:subClassOf :Animal .
:fido :hasToy :bone

:kitty rdf:type :Cat .
:Cat rdfs:subClassOf :Mammal .
:kitty :hasToy :catnipMouse .

And this query does what I want.

prefix rdf: <http:://www.w3.org/1999/02/22-rdf-syntax-ns#> .
prefix : <urn:ex:> .

SELECT ?individual ?type 
WHERE {
   ?individual :hasToy :bone .
   ?individual rdf:type ?type .
}

The problem is that I'd rather use a reasoned-over version of the OWL file, which unsurprisingly includes additional statements:

:fido rdf:type :Mammal .
:fido rdf:type :Vertebrate .
:fido rdf:type :Animal .
:kitty rdf:type :Mammal .
:kitty rdf:type :Vertebrate .
:kitty rdf:type :Animal .

And now the query results in additional answers about Fido being a Mammal, etc. I could just give up on using the reasoned version of the file, or, since the SPARQL queries are called from java, I could do a bunch of additional queries to find the least inclusive type that appears. My question is whether there is a reasonable pure SPARQL solution to only returning the Dog solution.


Solution

  • A generic solution is that you make sure you ask for the direct type only. A class C is the direct type of an instance x if:

    1. x is of type C
    2. there is no C' such that:
      • x is of type C'
      • C' is a subclass of C
      • C' is not equal to C

    (that last condition is necessary, by the way, because in RDF/OWL, the subclass-relation is reflexive: every class is a subclass of itself)

    In SPARQL, this becomes something like this:

    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX : <urn:ex:> .
    
    SELECT ?individual ?type 
    WHERE {
       ?individual :hasToy :bone .
       ?individual a ?type .
       FILTER NOT EXISTS { ?individual a ?other .
                           ?other rdfs:subClassOf ?type .
                           FILTER(?other != ?type)
       }
    }
    

    Depending on which API/triplestore/library you use to execute these queries, there may also be other, tool-specific solutions. For example, the Sesame API (disclosure: I am on the Sesame dev team) has the option to disable reasoning for the purpose of a single query:

    TupleQuery query = conn.prepareTupleQuery(SPARQL, "SELECT ...");
    query.setIncludeInferred(false); 
    
    TupleQueryResult result = query.evaluate();
    

    Sesame also offers an optional additional inferencer (called the 'direct type inferencer') which introduces additional 'virtual' properties you can query, such as sesame:directType, sesame:directSubClassOf, etc. Other tools will undoubtedly have similar options.