Let's say I have the following:
@prefix hr: <http://learningsparql.com/ns/humanResources#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
hr:Employee a rdfs:Class .
hr:BadThree rdfs:comment "some comment about missing" .
hr:BadTwo a hr:BadOne .
hr:YetAnother a hr:Another .
hr:YetAnotherName a hr:AnotherName .
hr:Another a hr:Employee .
hr:AnotherName a hr:name .
hr:BadOne a hr:Dangling .
hr:name a rdf:Property .
and I run the following SPARQL query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sch: <http://schema.org/>
SELECT DISTINCT ?s
WHERE {
{
?s ?p ?o .
FILTER NOT EXISTS {
?s a ?c .
FILTER(?c IN (rdfs:Class, rdf:Property))
}
}
}
The results returned will be:
----------------------------------------------------------------
| s |
================================================================
| <http://learningsparql.com/ns/humanResources#Another> |
| <http://learningsparql.com/ns/humanResources#BadOne> |
| <http://learningsparql.com/ns/humanResources#BadTwo> |
| <http://learningsparql.com/ns/humanResources#YetAnother> |
| <http://learningsparql.com/ns/humanResources#BadThree> |
| <http://learningsparql.com/ns/humanResources#AnotherName> |
| <http://learningsparql.com/ns/humanResources#YetAnotherName> |
----------------------------------------------------------------
The only two results I want returned are:
----------------------------------------------------------------
| s |
================================================================
| <http://learningsparql.com/ns/humanResources#BadOne> |
| <http://learningsparql.com/ns/humanResources#BadTwo> |
| <http://learningsparql.com/ns/humanResources#BadThree> |
----------------------------------------------------------------
What makes them bad? If I look at the rdf:type information, the type chain does not terminate in a type that is a rdfs:Class or rdf:Property.
If I look at hr:YetAnother, it has a rdf:type of hr:Another. hr:Another has a rdf:type of hr:Employee. So, the chain of types from hr:YetAnother and hr:Another terminates in a rdfs:Class and they should not be returned by a query.
In my example the type chain's are small, but there could be more links in the chain making them longer.
Is it possible to write such a query with SPARQL? If so, what would that query be?
The SPARQL feature required to solve this problem is called Property Paths.
The following query:
SELECT DISTINCT ?s
WHERE {
{
?s ?p ?o .
FILTER NOT EXISTS {
?s rdf:type* ?c .
FILTER(?c IN (rdfs:Class, rdf:Property) && ?s NOT IN (rdfs:Class, rdf:Property) )
}
}
}
will return the expected results:
----------------------------------------------------------
| s |
==========================================================
| <http://learningsparql.com/ns/humanResources#BadOne> |
| <http://learningsparql.com/ns/humanResources#BadTwo> |
| <http://learningsparql.com/ns/humanResources#BadThree> |
----------------------------------------------------------
Breaking the query does to better understand what is going on, consider,
(A)
SELECT DISTINCT *
WHERE {
{
?s ?p ?o .
}
}
which will return the following results:
-------------------------------------------------------------------------------------------------------------------------------------------
| s | p | o |
===========================================================================================================================================
| <http://learningsparql.com/ns/humanResources#Another> | rdf:type | <http://learningsparql.com/ns/humanResources#Employee> |
| <http://learningsparql.com/ns/humanResources#BadOne> | rdf:type | <http://learningsparql.com/ns/humanResources#Dangling> |
| <http://learningsparql.com/ns/humanResources#BadTwo> | rdf:type | <http://learningsparql.com/ns/humanResources#BadOne> |
| <http://learningsparql.com/ns/humanResources#Employee> | rdf:type | rdfs:Class |
| <http://learningsparql.com/ns/humanResources#YetAnother> | rdf:type | <http://learningsparql.com/ns/humanResources#Another> |
| <http://learningsparql.com/ns/humanResources#BadThree> | rdfs:comment | "some comment about missing" |
| <http://learningsparql.com/ns/humanResources#AnotherName> | rdf:type | <http://learningsparql.com/ns/humanResources#name> |
| <http://learningsparql.com/ns/humanResources#name> | rdf:type | rdf:Property |
| <http://learningsparql.com/ns/humanResources#YetAnotherName> | rdf:type | <http://learningsparql.com/ns/humanResources#AnotherName> |
-------------------------------------------------------------------------------------------------------------------------------------------
then, consider the following query:
(B)
SELECT DISTINCT ?s
WHERE {
{
?s rdf:type* ?c .
FILTER(?c IN (rdfs:Class, rdf:Property) && ?s NOT IN (rdfs:Class, rdf:Property) )
}
}
which returns the results:
----------------------------------------------------------------
| s |
================================================================
| <http://learningsparql.com/ns/humanResources#Employee> |
| <http://learningsparql.com/ns/humanResources#Another> |
| <http://learningsparql.com/ns/humanResources#YetAnother> |
| <http://learningsparql.com/ns/humanResources#name> |
| <http://learningsparql.com/ns/humanResources#AnotherName> |
| <http://learningsparql.com/ns/humanResources#YetAnotherName> |
----------------------------------------------------------------
By placing (B) in FILTER NOT EXISTS, the subjects found in (A) are removed, leaving only the desired results.