Search code examples
sparqlsesame

About UNION and FILTER NOT EXISTS in SPARQL (OpenRDF 2.8.0)


I learnt some semantic technologies, including RDF and SPARQL, a few years ago, then I didn't have chances to work with them for some time. Now I've started a new project which uses OpenRDF 2.8.0 as a semantic store and I'm resuming my knowledge, even though I have some forgotten things to recover.

In particular, in the past days I had some troubles in correctly undestanding the FILTER NOT EXIST construct in SPARQL.

Problem: I have a semantic store imported from DbTune.org (music ontologies). A mo:MusicArtist, intended as foaf:maker of a mo:Track, can be present in four scenarios (I'm only listing relevant statements):

    <http://dbtune.org/musicbrainz/resource/artist/013c8e5b-d72a-4cd3-8dee-6c64d6125823> a mo:MusicArtist ;
        vocab:artist_type "1"^^xs:short ;
        rdfs:label "Edvard Grieg" .

    <http://dbtune.org/musicbrainz/resource/artist/032df978-9130-490e-8857-0c9ef231fae8> a mo:MusicArtist ;
        vocab:artist_type "2"^^xs:short ;
        rel:collaboratesWith <http://dbtune.org/musicbrainz/resource/artist/3db5dfb1-1b91-4038-8268-ae04d15b6a3e> , <http://dbtune.org/musicbrainz/resource/artist/d78afc01-f918-440c-89fc-9d546a3ba4ac> ;
        rdfs:label "Doris Day & Howard Keel".

    <http://dbtune.org/musicbrainz/resource/artist/1645f335-2367-427d-8e2d-ad206946a8eb> a mo:MusicArtist ;
        vocab:artist_type "2"^^xs:short ;
        rdfs:label "Pat Metheny & Anna Maria Jopek".

    <http://dbtune.org/musicbrainz/resource/artist/12822d4f-4607-4f1d-ab16-d6bacc27cafe> a mo:MusicArtist ;
        rdfs:label "René Marie".

From what I understand, the vocab:artist_type is 1 for single artists (example #1) and 2 for groups of collaborations (examples #2 and #3). In this case, there might a few rel:collaboratesWith statements that point to the description of the single members of the group or collaboration (example #2). In some cases, the vocab:artist_type statement is missing (example #4).

Now I want to extract all the artists as single entities, where possibile. I mean, I don't want to retrieve example #2, because I will get "Doris Day" and "Howard Keel" separately. I have to retrieve example #3 "Pat Metheny & Anna Maria Jopek" because I can't do anything else. Of course, I also want to retrieve "René Marie".

I've solved the problem in a satisfactory way with this SPARQL:

    SELECT *
    WHERE  
      { 
        ?artist     a           mo:MusicArtist. 
        ?artist     rdfs:label  ?label. 

        MINUS 
          {
            ?artist     vocab:artist_type       "2"^^xs:short.
            ?artist     rel:collaboratesWith    ?any1 .
          }
      } 
    ORDER BY ?label

It makes sense and it looks like it's readable ("retrieve all mo:MusicArtist items minus those that are collaborations with individual members listed").

I didn't find the solution immediately. I first thought of putting together the three separate cases, with UNION:

    SELECT *
    WHERE  
      { 
        ?artist       a                 mo:MusicArtist. 
        ?artist       rdfs:label        ?label. 
    # Single artists
          {
            ?artist     vocab:artist_type       "1"^^xs:short.
          }
        UNION
    # Groups for which there is no defined collaboration with single persons
          {
            ?artist     vocab:artist_type       "2"^^xs:short.
            FILTER NOT EXISTS 
              {
                ?artist     rel:collaboratesWith    ?any1 
              }
          }
        UNION
    # Some artists don't have this attribute
          {
            FILTER NOT EXISTS 
              {
                ?artist     vocab:artist_type       ?any2
              }
          }
      } 
    ORDER BY ?label

I found that the third UNION statements, the ones which should add mo:MusicArtist items without a vocab:artist_type, didn't worked. That is, they didn't find the items such as "René Marie".

While I'm satisfied with the shortest solution I found with MINUS, I'm not ok with the fact that I don't understand why the older solution didn't work. Clearly I'm missing some point with FILTER NOT EXISTS that could be useful for some other case.

Any help is welcome.


Solution

  • When I run the following query, I get the results that it sounds like you're looking for:

    select distinct ?label where {
      ?artist a mo:MusicArtist ;
              rdfs:label ?label .
    
      #-- artists with type 1
      {
        ?artist vocab:artist_type "1"^^xs:short
      }
      #-- artists with no type
      union {
        filter not exists { 
          ?artist vocab:artist_type ?type
        }
      }
      #-- artists with type 2 that have no
      #-- collaborators
      union {
        ?artist vocab:artist_type "2"^^xs:short
        filter not exists {
          ?artist rel:collaboratesWith ?another
        }
      }
    }
    

    ------------------------------------
    | label                            |
    ====================================
    | "René Marie"                     |
    | "Pat Metheny & Anna Maria Jopek" |
    | "Edvard Grieg"                   |
    ------------------------------------
    

    I'm not whether I see where this essentially differs from yours, though. I do think that you could clean this query up a bit though. You can use optional and values to specify that the type is optional, but if present must be 1 or 2. Then you can add a filter that requires that when the value is 2, there is no collaborator.

    select ?label where {
      #-- get an artist and their label
      ?artist a mo:MusicArtist ;
              rdfs:label ?label .
    
      #-- and optionally their type, if it is
      #-- "1"^^xs:short or "2"^^xs:short
      optional {
        values ?type { "1"^^xs:short "2"^^xs:short }
        ?artist vocab:artist_type ?type
      }
    
      #-- if ?type is "2"^^xs:short, then ?artist
      #-- must not collaborate with anyone.
      filter ( !sameTerm(?type,"2"^^xs:short)
            || not exists { ?artist rel:collaboratesWith ?anyone })
    }
    

    ------------------------------------
    | label                            |
    ====================================
    | "René Marie"                     |
    | "Pat Metheny & Anna Maria Jopek" |
    | "Edvard Grieg"                   |
    ------------------------------------