Search code examples
sparqlrdfsemantic-web

Filtering out statements containing items of specific types


I have a graph G. I'd like to create G', a subset of G, by filtering out from G all items belonging to specific types, let's say {:Foo, :Bar}.

For example, if this is G

:x a :Foo
:y a :Bar

:x :predicate_a :hh
:kk :predicate_b :y
:mm :predicate_b :kk

G' should be:

:mm :predicate_b :kk

My best current option is using DELETE on G. I need two queries per each type:

(i) one for the subjects

delete where 
    {
        ?s ?p ?o .
        ?s a :Foo .
    } 

(i) another one for the objects

delete where 
    {
        ?s ?p ?o .
        ?o a :Foo .
    } 

In that way, I should get what I need. It seems to me that's not the best option, though. Are there better ways (i.e., more efficient/compact)?


Solution

  • It can be done in a single query, using UNION and VALUES. This should work for both classes in one go:

    PREFIX : <http://www.example.com/foo#>
    
    DELETE { ?s ?p ?o }
    WHERE
    {
      VALUES (?toDeleteClass) {
        (:Foo)
        (:Bar)
      }
    
      ?toDelete a ?toDeleteClass 
      # or, if you want transitivity: ?toDelete a/rdfs:subClassOf* ?toDeleteClass
    
      { BIND( ?toDelete AS ?s ). ?s ?p ?o } 
      UNION { BIND( ?toDelete AS ?o ). ?s ?p ?o } 
    }
    

    Combining this with comments under your question, you can build a new graph G', rather than modifying the existing one (using INSERT and GRAPH), or, using CONSTRUCT, you can extract and download G' (but in this case you might need to do it in chunks, via LIMIT/OFFSET, since many triple stores have a limit about the result size a query can return).

    An alternative to VALUES would be FILTER ( ?toDeleteClass IN ( :Foo, :Bar ). However, VALUES look more natural for the task you have and might be faster as well.

    Beware of inference: if your triple store has some inference enabled by default, the pattern ?toDelete a ?toDeleteClass might pick transitive instances of Foo/Bar too, i.e., those that are instances of subclasses of Foo/Bar, not just the direct ones. If you don't want this, the best is to find how you can disable inference in your triple store (you could detect indirect instances via FILTER, but it's more complicated and slower).