Search code examples
sparql

Are these two SPARQL 1.0 queries equivalent?


I'm querying a Mulgara triple store with the following two queries, trying to return subjects that do not match certain values. I'm limited to whatever parts of SPARQL 1.0 that Mulgara implements, and I'm curious if these two queries are effectively the same, or if there are behavioral differences that I'm not see in my results.

Thanks in advance for your time and help.

Query 1:

PREFIX fedora-model:        <info:fedora/fedora-system:def/model#>
PREFIX fedora-rels-ext: <info:fedora/fedora-system:def/relations-external#>

SELECT ?pids
FROM <#ri>
WHERE {
    { ?pids fedora-model:hasModel <info:fedora/islandora:sp_large_image_cmodel> }
    UNION
    { ?pids fedora-model:hasModel <info:fedora/islandora:bookCModel> }
    UNION
    { ?pids fedora-model:hasModel <info:fedora/islandora:collectionCModel> }
    UNION
    { ?pids fedora-model:hasModel <info:fedora/islandora:compoundCModel> }
    UNION
    { ?pids fedora-model:hasModel <info:fedora/islandora:sp-audioCModel> }
    UNION
    { ?pids fedora-model:hasModel <info:fedora/islandora:sp_videoCModel> }
    UNION
    { ?pids fedora-model:hasModel <info:fedora/islandora:sp_basic_image> }
    UNION
    { ?pids fedora-model:hasModel <info:fedora/islandora:sp_pdf> }
    UNION
    { ?pids fedora-model:hasModel <info:fedora/islandora:oralhistoriesCModel> }
}

and Query 2:

PREFIX fedora-model:     <info:fedora/fedora-system:def/model#>
PREFIX fedora-rels-ext:  <info:fedora/fedora-system:def/relations-external#>

SELECT ?pids
FROM <#ri>
WHERE {
  ?pids fedora-model:hasModel ?models .
  FILTER (!regex(str(?models), "pageCModel") &&
          !regex(str(?models), "FedoraObject-3.0") &&
          !regex(str(?models), "transformCModel") &&
          !regex(str(?models), "ContentModel-3.0")) .
}

Solution

  • In general, no, these are not equivalent. Some of of the reasons why include:

    • The former has an explicit list of model values to include, while the latter attempts to exclude values. Depending on the data, the two queries may return very different results
    • The latter query uses a REGEX on the string value of ?models, but does not:
      • verify that the values of ?models is an IRI (it could be a literal that satisfies the filter conditions, for example)
      • verify that the REGEX is matching at the end of the string (I presume this is the intent)

    And as a comment already mentions above, the use of REGEX will likely also have significant impact on query performance.