Search code examples
sparqlrdfrdfsturtle-rdf

Can a type chain be followed to test where it ends?


Let's say I have the following:

@prefix hr: <http://learningsparql.com/ns/humanResources#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

hr:Employee a rdfs:Class .
hr:BadThree rdfs:comment "some comment about missing" .
hr:BadTwo a hr:BadOne .
hr:YetAnother a hr:Another .
hr:YetAnotherName a hr:AnotherName .
hr:Another a hr:Employee .
hr:AnotherName a hr:name .
hr:BadOne a hr:Dangling .
hr:name a rdf:Property .

and I run the following SPARQL query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sch:  <http://schema.org/>

SELECT DISTINCT ?s
WHERE {
    {
        ?s ?p ?o .
        FILTER NOT EXISTS {
            ?s a ?c .
            FILTER(?c IN (rdfs:Class, rdf:Property))
        }
    }
}

The results returned will be:

----------------------------------------------------------------
| s                                                            |
================================================================
| <http://learningsparql.com/ns/humanResources#Another>        |
| <http://learningsparql.com/ns/humanResources#BadOne>         |
| <http://learningsparql.com/ns/humanResources#BadTwo>         |
| <http://learningsparql.com/ns/humanResources#YetAnother>     |
| <http://learningsparql.com/ns/humanResources#BadThree>       |
| <http://learningsparql.com/ns/humanResources#AnotherName>    |
| <http://learningsparql.com/ns/humanResources#YetAnotherName> |
----------------------------------------------------------------

The only two results I want returned are:

----------------------------------------------------------------
| s                                                            |
================================================================
| <http://learningsparql.com/ns/humanResources#BadOne>         |
| <http://learningsparql.com/ns/humanResources#BadTwo>         |
| <http://learningsparql.com/ns/humanResources#BadThree>       | 
----------------------------------------------------------------

What makes them bad? If I look at the rdf:type information, the type chain does not terminate in a type that is a rdfs:Class or rdf:Property.

If I look at hr:YetAnother, it has a rdf:type of hr:Another. hr:Another has a rdf:type of hr:Employee. So, the chain of types from hr:YetAnother and hr:Another terminates in a rdfs:Class and they should not be returned by a query.

In my example the type chain's are small, but there could be more links in the chain making them longer.

Is it possible to write such a query with SPARQL? If so, what would that query be?


Solution

  • The SPARQL feature required to solve this problem is called Property Paths.

    The following query:

    SELECT DISTINCT ?s
    WHERE {
        {
            ?s ?p ?o .    
            FILTER NOT EXISTS {
                ?s rdf:type* ?c .
                 FILTER(?c IN (rdfs:Class, rdf:Property) && ?s NOT IN (rdfs:Class, rdf:Property) )
            }
        }
    }
    

    will return the expected results:

    ----------------------------------------------------------
    | s                                                      |
    ==========================================================
    | <http://learningsparql.com/ns/humanResources#BadOne>   |
    | <http://learningsparql.com/ns/humanResources#BadTwo>   |
    | <http://learningsparql.com/ns/humanResources#BadThree> |
    ----------------------------------------------------------
    

    Breaking the query does to better understand what is going on, consider,

    (A)

    SELECT DISTINCT *
    WHERE {
        {
            ?s ?p ?o .       
        }
    }
    

    which will return the following results:

    -------------------------------------------------------------------------------------------------------------------------------------------
    | s                                                            | p            | o                                                         |
    ===========================================================================================================================================
    | <http://learningsparql.com/ns/humanResources#Another>        | rdf:type     | <http://learningsparql.com/ns/humanResources#Employee>    |
    | <http://learningsparql.com/ns/humanResources#BadOne>         | rdf:type     | <http://learningsparql.com/ns/humanResources#Dangling>    |
    | <http://learningsparql.com/ns/humanResources#BadTwo>         | rdf:type     | <http://learningsparql.com/ns/humanResources#BadOne>      |
    | <http://learningsparql.com/ns/humanResources#Employee>       | rdf:type     | rdfs:Class                                                |
    | <http://learningsparql.com/ns/humanResources#YetAnother>     | rdf:type     | <http://learningsparql.com/ns/humanResources#Another>     |
    | <http://learningsparql.com/ns/humanResources#BadThree>       | rdfs:comment | "some comment about missing"                              |
    | <http://learningsparql.com/ns/humanResources#AnotherName>    | rdf:type     | <http://learningsparql.com/ns/humanResources#name>        |
    | <http://learningsparql.com/ns/humanResources#name>           | rdf:type     | rdf:Property                                              |
    | <http://learningsparql.com/ns/humanResources#YetAnotherName> | rdf:type     | <http://learningsparql.com/ns/humanResources#AnotherName> |
    -------------------------------------------------------------------------------------------------------------------------------------------
    

    then, consider the following query:

    (B)

    SELECT DISTINCT ?s
    WHERE {
        {
            ?s rdf:type* ?c .
            FILTER(?c IN (rdfs:Class, rdf:Property) && ?s NOT IN (rdfs:Class, rdf:Property) )
    
        }
    }
    

    which returns the results:

    ----------------------------------------------------------------
    | s                                                            |
    ================================================================
    | <http://learningsparql.com/ns/humanResources#Employee>       |
    | <http://learningsparql.com/ns/humanResources#Another>        |
    | <http://learningsparql.com/ns/humanResources#YetAnother>     |
    | <http://learningsparql.com/ns/humanResources#name>           |
    | <http://learningsparql.com/ns/humanResources#AnotherName>    |
    | <http://learningsparql.com/ns/humanResources#YetAnotherName> |
    ----------------------------------------------------------------
    

    By placing (B) in FILTER NOT EXISTS, the subjects found in (A) are removed, leaving only the desired results.