I'm using SPARQL for RDF and am having trouble coming up with a query that will allow me to filter out all vertices that have edges to other vertices that are not in a list of specified values.
Here is a visual representation of the graph I'm working with, which contains nodes of two separate RDF types (:package
and :platform
). In this graph, packages (:Package_A
, :Package_B
, :Package_C
, and :Package_D
) have outgoing edges to each platform that they require, and the values of the platforms are :Platform_1:
and :Platform_2
:
Here is the data that creates this graph:
INSERT DATA {
:Package_A rdf:type :package .
:Package_B rdf:type :package .
:Package_C rdf:type :package .
:Package_D rdf:type :package .
:Platform_1 rdf:type :platform .
:Platform_2 rdf:type :platform .
:Package_A :platform :Platform_1 .
:Package_B :platform :Platform_1 .
:Package_C :platform :Platform_1 .
:Package_D :platform :Platform_1 .
:Package_D :platform :Platform_2 .
}
I would like to form a query that is able to return all vertices with rdf:type = :package, but i would like to exclude all packages that have any outgoing edges to a platform that is not in a list of specified platform values.
For example, in the case of this specified singleton list:
[:Platform_1]
Package_A
, Package_B
and Package_C
should be returned since these packages only have edges that lead to Platform_1
.
While Package_D
should get filtered out since it contains edges to both Platform_1
AND Platform_2
(and Platform_2
is not in the specified list).
So far, I have attempted this query using a FILTER
, which doesn't work, as Package_D
is still returned by it:
SELECT * {
?package a :package .
?package :platform ?platform .
FILTER(:platform NOT IN(:Platform_1))
}
Anyone have any ideas how I could form a query that would exclude any vertices that have edges to values that are not in the list (in this case, any edge that leads to Platform_2
)?
The problem with your query is that the filter condition succeeds if any platform vertice matches the condition. You get back Package_D
because one of the platforms is :Platform_2
, which is not in the list (:Platform_1)
- so the condition succeeds.
What you need instead is a filter condition that checks if none of the matching platforms are in the list. For this purpose, SPARQL has the NOT EXISTS
condition, which takes a graph pattern as an argument:
?package a :package.
FILTER NOT EXISTS { ?package :platform ?platform.
FILTER (?platform IN(:Platform_1)) }
Literally says: "give me back packages for which no platform exists that is in the supplied list:".