I learnt some semantic technologies, including RDF and SPARQL, a few years ago, then I didn't have chances to work with them for some time. Now I've started a new project which uses OpenRDF 2.8.0 as a semantic store and I'm resuming my knowledge, even though I have some forgotten things to recover.
In particular, in the past days I had some troubles in correctly undestanding the FILTER NOT EXIST construct in SPARQL.
Problem: I have a semantic store imported from DbTune.org (music ontologies). A mo:MusicArtist
, intended as foaf:maker
of a mo:Track
, can be present in four scenarios (I'm only listing relevant statements):
<http://dbtune.org/musicbrainz/resource/artist/013c8e5b-d72a-4cd3-8dee-6c64d6125823> a mo:MusicArtist ;
vocab:artist_type "1"^^xs:short ;
rdfs:label "Edvard Grieg" .
<http://dbtune.org/musicbrainz/resource/artist/032df978-9130-490e-8857-0c9ef231fae8> a mo:MusicArtist ;
vocab:artist_type "2"^^xs:short ;
rel:collaboratesWith <http://dbtune.org/musicbrainz/resource/artist/3db5dfb1-1b91-4038-8268-ae04d15b6a3e> , <http://dbtune.org/musicbrainz/resource/artist/d78afc01-f918-440c-89fc-9d546a3ba4ac> ;
rdfs:label "Doris Day & Howard Keel".
<http://dbtune.org/musicbrainz/resource/artist/1645f335-2367-427d-8e2d-ad206946a8eb> a mo:MusicArtist ;
vocab:artist_type "2"^^xs:short ;
rdfs:label "Pat Metheny & Anna Maria Jopek".
<http://dbtune.org/musicbrainz/resource/artist/12822d4f-4607-4f1d-ab16-d6bacc27cafe> a mo:MusicArtist ;
rdfs:label "René Marie".
From what I understand, the vocab:artist_type
is 1
for single artists (example #1) and 2
for groups of collaborations (examples #2 and #3). In this case, there might a few rel:collaboratesWith
statements that point to the description of the single members of the group or collaboration (example #2). In some cases, the vocab:artist_type
statement is missing (example #4).
Now I want to extract all the artists as single entities, where possibile. I mean, I don't want to retrieve example #2, because I will get "Doris Day" and "Howard Keel" separately. I have to retrieve example #3 "Pat Metheny & Anna Maria Jopek" because I can't do anything else. Of course, I also want to retrieve "René Marie".
I've solved the problem in a satisfactory way with this SPARQL:
SELECT *
WHERE
{
?artist a mo:MusicArtist.
?artist rdfs:label ?label.
MINUS
{
?artist vocab:artist_type "2"^^xs:short.
?artist rel:collaboratesWith ?any1 .
}
}
ORDER BY ?label
It makes sense and it looks like it's readable ("retrieve all mo:MusicArtist
items minus those that are collaborations with individual members listed").
I didn't find the solution immediately. I first thought of putting together the three separate cases, with UNION
:
SELECT *
WHERE
{
?artist a mo:MusicArtist.
?artist rdfs:label ?label.
# Single artists
{
?artist vocab:artist_type "1"^^xs:short.
}
UNION
# Groups for which there is no defined collaboration with single persons
{
?artist vocab:artist_type "2"^^xs:short.
FILTER NOT EXISTS
{
?artist rel:collaboratesWith ?any1
}
}
UNION
# Some artists don't have this attribute
{
FILTER NOT EXISTS
{
?artist vocab:artist_type ?any2
}
}
}
ORDER BY ?label
I found that the third UNION
statements, the ones which should add mo:MusicArtist
items without a vocab:artist_type
, didn't worked. That is, they didn't find the items such as "René Marie".
While I'm satisfied with the shortest solution I found with MINUS
, I'm not ok with the fact that I don't understand why the older solution didn't work. Clearly I'm missing some point with FILTER NOT EXISTS
that could be useful for some other case.
Any help is welcome.
When I run the following query, I get the results that it sounds like you're looking for:
select distinct ?label where {
?artist a mo:MusicArtist ;
rdfs:label ?label .
#-- artists with type 1
{
?artist vocab:artist_type "1"^^xs:short
}
#-- artists with no type
union {
filter not exists {
?artist vocab:artist_type ?type
}
}
#-- artists with type 2 that have no
#-- collaborators
union {
?artist vocab:artist_type "2"^^xs:short
filter not exists {
?artist rel:collaboratesWith ?another
}
}
}
------------------------------------
| label |
====================================
| "René Marie" |
| "Pat Metheny & Anna Maria Jopek" |
| "Edvard Grieg" |
------------------------------------
I'm not whether I see where this essentially differs from yours, though. I do think that you could clean this query up a bit though. You can use optional and values to specify that the type is optional, but if present must be 1 or 2. Then you can add a filter that requires that when the value is 2, there is no collaborator.
select ?label where {
#-- get an artist and their label
?artist a mo:MusicArtist ;
rdfs:label ?label .
#-- and optionally their type, if it is
#-- "1"^^xs:short or "2"^^xs:short
optional {
values ?type { "1"^^xs:short "2"^^xs:short }
?artist vocab:artist_type ?type
}
#-- if ?type is "2"^^xs:short, then ?artist
#-- must not collaborate with anyone.
filter ( !sameTerm(?type,"2"^^xs:short)
|| not exists { ?artist rel:collaboratesWith ?anyone })
}
------------------------------------
| label |
====================================
| "René Marie" |
| "Pat Metheny & Anna Maria Jopek" |
| "Edvard Grieg" |
------------------------------------