Search code examples
recursionconstraintsrdfsparql

Is it possible to express a recursive definition in SPARQL?


Imagine that you have a simple social network where people must have only one property rdfs:label with value "Person" and can have any number of foaf:knows whose values must also be people with the same structure. Some example data could be:

:peter foaf:knows :john; 
       foaf:knows :anna;
       rdfs:label "Person" .

:john  foaf:knows :anna;
       rdfs:label "Person" .

:anna  rdfs:label "Person" .

In logic terms, the definition could be something like:

∀x(Person(x) ≡ rdfs:label(x,"Person")∧∀y(rdfs:label(x,y)→y="Person")∧∀y(foaf:knows(x,y)→Person(y)))

Is it possible to express those recursive definitions in SPARQL?

I was able to express part of the query without the recursive reference of foaf:knows as:

PREFIX ex: <http://xmlns.com/foaf/0.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>

select ?person {

    # Ensure there is only one rdfs:label
    { SELECT ?person {
      ?person rdfs:label ?o .
    } GROUP BY ?person HAVING (COUNT(*)=1)}

    # Ensure the rdfs:label value is "Person"
    { SELECT ?person {
      ?person rdfs:label ?o . 
      FILTER ((?o = "Person"))
    } GROUP BY ?person HAVING (COUNT(*)=1)}

    # Count the number of foaf:knows
    { SELECT ?person (COUNT(*) AS ?p_c0) { 
       ?person foaf:knows [] . 
      } GROUP BY ?person
    }

    # Count the number of foaf:knows with a value that has the structure of a person
    { SELECT ?person (COUNT(*) AS ?p_c1) { 
       ?person foaf:knows ?person1 . # How can I express that person1 has the structure of people?
      } GROUP BY ?person
    }
    FILTER (?p_c0 = ?p_c1)
}

Is it possible to express such recursive definitions in SPARQL?

Note: I edited the question changing the term "constraint" by "definition" following Joshua's suggestion


Solution

  • This is a definition, not a pair of constraints

    We often think of definitions in terms of necessary and sufficient conditions. Sufficient conditions are those that give us "enough" information to conclude that something is a element of a given set, and necessary conditions are those that tell us a bit more about the individuals. For instance, in OWL, we might have the axioms:

    Man ⊑ Person
    Person ⊑ ∃hasName

    The first is a sufficient condition for Person: knowing that something is a Man is sufficient to determine that it's also a Person. The second is a necessary condition for Persons: if something is a person, then it must have a name. (Dually, we can also note that the first axiom is necessary condition for Man: if something is a Man, then it must be a Person. The second axiom is a sufficient condition for ∃hasName; if something is a Person, then it must have a name.)

    Constraint checking is typically the task of finding individuals that meet the sufficient conditions for a class, but don't satisfy all the necessary conditions. That's not what you're trying to do here. Instead, you're looking for individuals that meet the necessary and sufficient conditions of personhood:

    1. have exactly the label "Person"
    2. know only other persons.

    In constraint validation, you'll write a query that that finds problematic individuals (e.g., things that are supposed to be persons, but aren't), but in your task, you'll find good individuals.

    Finding individuals that meet your specification

    In general you can't specify recursive definitions in SPARQL, but in this case, you can write a query that will select all people. The trick is to first use a pattern that identifies all nodes in the graph. Then, conceptually, we suppose that each one is a person, and then filter out those that don't meet the conditions. That's possible in this case, because the condition is simply that everything reachable by a chain of foaf:knows (including the zero length chain) should have the label "Person" and nothing else. Here's some sample data (including the examples from your answer), the query, and finally the results.

    @prefix : <http://stackoverflow.com/q/25256452/1281433/>.
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
    @prefix foaf: <http://xmlns.com/foaf/0.1/>.
    
    :peter foaf:knows :john; 
           foaf:knows :anna;
           rdfs:label "Person" .
    
    :john  foaf:knows :anna;
           rdfs:label "Person" .
    
    :anna  rdfs:label "Person" .
    
    :mary rdfs:label "Person" .
    
    :tom rdfs:label "Cat" .
    
    :pluto rdfs:label "Dog" ; foaf:knows :tom .
    
    :ben rdfs:label "Wolf"; rdfs:label "Person" .
    
    :mary rdfs:label "Person"; foaf:knows :ben .
    
    :sam rdfs:label "Person"; foaf:knows :mary .
    
    prefix : <http://stackoverflow.com/q/25256452/1281433/>
    prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    prefix foaf: <http://xmlns.com/foaf/0.1/>
    
    select ?person where { 
      #-- each node in the graph
      ?person :? ?person .
    
      #-- except those that foaf:know* some ?x
      #-- (and since * includes the zero length
      #-- path, ?x is also bound to ?person)
      #-- that don't meet the labeling condition.
      filter not exists {
        ?person foaf:knows* ?x
        optional { ?x rdfs:label ?label }
        filter ( !bound(?label) || ?label != "Person" )
      }
    }
    
    ----------
    | person |
    ==========
    | :anna  |
    | :john  |
    | :peter |
    ----------
    

    Constraint Checking

    Now, suppose that conditions 1 and 2 above were actually necessary conditions for personhood. Then, depending on the sufficient conditions, we could write different queries to find violations. You probably want to have non-person nodes in the graph, so you might have a sufficient condition that a node has rdf:type :Person. Then you can use a query like this:

    prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    prefix : <http://stackoverflow.com/q/25256452/1281433/>
    
    #-- There are two way something with type :Person
    #-- can be invalid.  (i) it can lack a label, or 
    #-- have a label other than "Person";  (ii) it 
    #-- can have a value of foaf:knows* that doesn't
    #-- have rdf:type :Person.
    
    select ?person where {
      #-- Examine each person in the graph.
      ?person a :Person .
    
      { #-- Check that ?person has a label, and that
        #-- that it has no label other than "Person"
        optional { ?person rdfs:label ?label } 
        filter ( !bound(?label) || ?label != "Person" )
      } UNION
      { #-- Check that every value of foaf:knows 
        #-- also has type :Person.  If some value
        #-- has type :Person, but violates the constraints,
        #-- we'll catch it separately.
        ?person foaf:knows ?x .
        filter not exists { ?x a :Person }
      }
    }