Search code examples
graphsparqlrdfamazon-neptune

Filter out vertices that have outgoing edges to other vertices that are NOT in a specified list of values (SPARQL)


I'm using SPARQL for RDF and am having trouble coming up with a query that will allow me to filter out all vertices that have edges to other vertices that are not in a list of specified values.

Here is a visual representation of the graph I'm working with, which contains nodes of two separate RDF types (:package and :platform). In this graph, packages (:Package_A, :Package_B, :Package_C, and :Package_D) have outgoing edges to each platform that they require, and the values of the platforms are :Platform_1: and :Platform_2: enter image description here

Here is the data that creates this graph:

INSERT DATA {
   :Package_A             rdf:type   :package .
   :Package_B             rdf:type   :package .
   :Package_C             rdf:type   :package .
   :Package_D             rdf:type   :package .

   :Platform_1            rdf:type   :platform .
   :Platform_2            rdf:type   :platform .

   :Package_A             :platform  :Platform_1 .
   :Package_B             :platform  :Platform_1 .
   :Package_C             :platform  :Platform_1 .
   :Package_D             :platform  :Platform_1 . 

   :Package_D             :platform  :Platform_2 .
}

I would like to form a query that is able to return all vertices with rdf:type = :package, but i would like to exclude all packages that have any outgoing edges to a platform that is not in a list of specified platform values.

For example, in the case of this specified singleton list: [:Platform_1]

Package_A, Package_B and Package_C should be returned since these packages only have edges that lead to Platform_1.

While Package_D should get filtered out since it contains edges to both Platform_1 AND Platform_2 (and Platform_2 is not in the specified list).

So far, I have attempted this query using a FILTER, which doesn't work, as Package_D is still returned by it:

SELECT * {
    ?package a :package .
    ?package :platform ?platform .
    FILTER(:platform NOT IN(:Platform_1))
}

Anyone have any ideas how I could form a query that would exclude any vertices that have edges to values that are not in the list (in this case, any edge that leads to Platform_2)?


Solution

  • The problem with your query is that the filter condition succeeds if any platform vertice matches the condition. You get back Package_D because one of the platforms is :Platform_2, which is not in the list (:Platform_1) - so the condition succeeds.

    What you need instead is a filter condition that checks if none of the matching platforms are in the list. For this purpose, SPARQL has the NOT EXISTS condition, which takes a graph pattern as an argument:

    ?package a :package.
    FILTER NOT EXISTS { ?package :platform ?platform. 
                        FILTER (?platform IN(:Platform_1)) }
    

    Literally says: "give me back packages for which no platform exists that is in the supplied list:".