Search code examples
sparqlrdfsemantic-webowlontology

SPARQL if an instance has a property, others must as well


I have a specific instance and a SPARQL query that retrieves other instances similar to that one. By similar, I mean that the other instances have at least one common property and value in common with the specific instance, or have a class in common with the specific instance.

Now, I'd like to extend the query such that if the specific instance has a value for a "critical" property, then the only instances that are considered similar are those that also have that critical property (as opposed to just having at least one property and value in common).

For instance, here is some sample data in which instance1 has a value for predicate2 which is a subproperty of isCriticalPredicate.

@prefix : <http://example.org/rs#>
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

:instance1  a   :class1.
:instance1  :predicate1 :instance3.
:instance1  :predicate2 :instance3.  # (@) critical property

:instance2  a   :class1.
:instance2  :predicate1 :instance3.

:instance4  :predicate2 :instance3.

:predicate1 :hasSimilarityValue 0.6.

:predicate2 rdfs:subPropertyOf   :isCriticalPredicate.
:predicate2 :hasSimilarityValue 0.6.

:class1 :hasSimilarityValue 0.4.

Here is a query in which ?x is the specific instance, instance1. The query retrieves just instance4, which is correct. However, if I remove the critical property from instance1 (line labeled with @), I get no results, but should get instance2, since it has a property in common with instance1. How can I fix this?

PREFIX : <http://example.org/rs#>

select ?item (SUM(?similarity * ?importance * ?levelImportance) as ?summedSimilarity) 
(group_concat(distinct ?becauseOf ; separator = " , ") as ?reason) where
{
  values ?x {:instance1}
  bind (4/7 as ?levelImportance)
  {
    values ?instanceImportance {1}
    ?x  ?p  ?instance.
    ?item   ?p  ?instance.
    ?p  :hasSimilarityValue ?similarity
      bind (?p as ?becauseOf)
    bind (?instanceImportance as ?importance)
  }
  union
  {
      values ?classImportance {1}
    ?x  a   ?class.
    ?item   a   ?class.
    ?class  :hasSimilarityValue ?similarity
      bind (?class as ?becauseOf)
        bind (?classImportance as ?importance)
  }
  filter (?x != ?item)

    ?x :isCriticalPredicate ?y.
    ?item   :isCriticalPredicate ?y.

}
group by ?item

Solution

  • I've said it before and I'll say it again: minimal data is very helpful. From your past questions, I know that your working project has similarity values on properties and the like, but none of that really matters for the problem at hand. Here's some data that just has a few instances, property values, and one property designated as critical:

    @prefix : <urn:ex:>
    
    :p a :criticalProperty .
    
    :a :p :u ;   #-- :a has a critical property, so 
       :q :v .   #-- only :d can be similar to it.
    
    :c :q :v ;   #-- :c has no critical properties, so
       :r :w .   #-- both :a and :d can be similar to it.
    
    :d :p :u ;
       :q :v .
    

    The trick in a query like this is to filter out the results that have the problem, not to try to select the ones that don't. Logically, those mean the same thing, but in writing the query, it's easier to think about constraint violation, and to try to filter out the results that violate the constraint. In this case, you want to filter out any results where the instance has a critical property and value but the similar instance doesn't.

    prefix : <urn:ex:>
    
    select ?i (group_concat(distinct ?j) as ?js) where {
    
      #-- :a has a critical property, but
      #-- :c does not, so these are useful
      #-- starting points
      values ?i { :a :c }
    
      #-- get ?j instances that have a value
      #-- in common with ?i.
      ?i ?property ?value .
      ?j ?property ?value .
    
      #-- make sure that ?i and ?j aren't
      #-- the same instance
      filter (?i != ?j)
    
      #-- make sure that there is no critical
      #-- property value that ?i has that
      #-- ?j does not also have
      filter not exists {
        ?i ?criticalProperty ?criticalValue .
        ?criticalProperty a :criticalProperty .
        filter not exists {
          ?j ?criticalProperty ?criticalValue .
        }
      }
    }
    group by ?i
    
    ----------------------------
    | i  | js                  |
    ============================
    | :a | "urn:ex:d"          |
    | :c | "urn:ex:d urn:ex:a" |
    ----------------------------
    

    Related

    There are some other questions that also touch on constaint satisfaction/violation that might be useful reading. While not all of these use nested filter not exists, most of them do have a pattern of filter not exists { … filter <negative condition> }.