Search code examples
rdfsparqltriples

How to get basic term associations from a list of terms in SPARQL


One of my coworkers is needing to get two sets of RDF triples involving term associations. The terms come from a list, and the associations come from a set of triples using those terms.

  • The first set is all triples with any item in the term list being either the triple's subject or object.

  • The second set is all triples with any two terms being either one or two predicates distant from each other, where the predicates are not necessarily bidirectional. So, for s1 and s2 in the term list, two triples s1 → s3 and s2 → s3 would be valid.

I think I have answers already, but I wanted to ask to contribute to the SPARQL base, as well as check myself.


Solution

  • Given data like this:

    @prefix : <urn:ex:> .
    
    :a :p :b .
    :a :p :e .
    
    :b :p :c .
    :b :p :d .
    
    :c :p :a .
    :c :p :f .
    
    :d :p :a .
    :d :p :d .
    

    if we take (:b :c) as the set of interesting terms, the following query will find all the triples that you're interested in. Note that the condition from the first set, i.e,. that from ?s ?p ?o either ?s or ?o is in the term list, gets some of the second set as well, namely the part where two terms are connected, i.e., where both ?s and ?o are in the term list.

    prefix : <urn:ex:>
    
    select distinct ?s ?p ?between ?q ?o where { 
      # term list appearing twice in order to 
      # get all pairs of items
      values ?x { :b :c }
      values ?y { :b :c }
    
      # This handles the first set (where either the subject or
      # object is from the term list).  Note that the first set
      # includes part of the second set;  when two terms from 
      # the list are separated by just one predicate, then it's
      # a case where either the subject or object are from the
      # term list (since *both* are).
      { ?s ?p ?x bind(?x as ?o)} UNION { ?x ?p ?o bind(?x as ?s)}
    
      UNION 
    
      # The rest of the second set is when two terms from the
      # list are connected by a path of length two.  This is 
      # a staightforward pattern to write.
      { ?x ?p ?between . ?between ?q ?y .
        bind(?x as ?s)
        bind(?y as ?o) }
    }
    

    In the results, single triples are the rows in which just s, p, and o are bound. These cover your first set, as well as the "distance = 1" portion of your second set. The rest of the second set also binds between and q. In terms of the example in your question, between is s3.

    $ arq --data data.n3 --query query.sparql
    -------------------------------
    | s  | p  | between | q  | o  |
    ===============================
    | :a | :p |         |    | :b |
    | :b | :p |         |    | :d |
    | :b | :p |         |    | :c |
    | :c | :p |         |    | :f |
    | :c | :p |         |    | :a |
    | :c | :p | :a      | :p | :b |
    -------------------------------
    

    Update based on Comment

    Given the example in the comment, I think that this query can be shortened dramatically to the following:

    prefix : <urn:ex:>
    
    select distinct ?x ?p ?between ?q ?y where { 
      values ?x { :b :c }
      values ?y { :b :c }
    
      { ?x ?p ?between } UNION { ?between ?p ?x }
      { ?between ?q ?y } UNION { ?y ?q ?between }
    }
    

    Once we bind ?x ?p ?between or ?between ?p ?x, we're just saying that there's an edge (in either direction) between ?x and ?between. ?y and ?q extend that path so we have:

    ?x --?p-- ?between --?q-- ?y
    

    where the actual directions of --?p-- and --?q-- could be left or right. This covers all the cases we need. It's probably not hard to see why paths of length two will match this pattern, but the case for triples in which just the subject or object is a special term merits elaboration. Given a triple

    <term> <prop> <non-term>
    

    we can get the path

    <term> --<prop>-- <non-term> --<prop>-- <term>
    

    and this works in the case that <term> is the object and <non-term> is the subject. It also covers the case in which both the subject and object are terms. On the data above, the results are:

    $ arq --data data.n3 --query paths.sparql
    -------------------------------
    | x  | p  | between | q  | y  |
    ===============================
    | :b | :p | :d      | :p | :b |   
    | :b | :p | :c      | :p | :b |   
    | :b | :p | :a      | :p | :b |
    | :c | :p | :a      | :p | :b |
    | :b | :p | :a      | :p | :c |
    | :c | :p | :f      | :p | :c |
    | :c | :p | :a      | :p | :c |
    | :c | :p | :b      | :p | :c |
    -------------------------------
    

    If we add some information about which way ?p and ?q were pointing, we can reconstruct the paths:

    prefix : <urn:ex:>
    
    select distinct ?x ?p ?pdir ?between ?q ?qdir ?y where { 
      values ?x { :b :c }
      values ?y { :b :c }
    
      { ?x ?p ?between bind("right" as ?pdir)} UNION { ?between ?p ?x bind("left" as ?pdir)}
      { ?between ?q ?y bind("right" as ?qdir)} UNION { ?y ?q ?between bind("left" as ?qdir)}
    }
    

    This gives output:

    $ arq --data data.n3 --query paths.sparql
    ---------------------------------------------------
    | x  | p  | pdir    | between | q  | qdir    | y  |
    ===================================================
    | :b | :p | "right" | :d      | :p | "left"  | :b |   # :b -> :d
    | :b | :p | "right" | :c      | :p | "left"  | :b |   # :b -> :c 
    | :b | :p | "left"  | :a      | :p | "right" | :b |   # :a -> :b 
    | :c | :p | "right" | :a      | :p | "right" | :b |   # :c -> :a -> :b
    | :b | :p | "left"  | :a      | :p | "left"  | :c |   # :c -> :a -> :b 
    | :c | :p | "right" | :f      | :p | "left"  | :c |   # :c -> :f 
    | :c | :p | "right" | :a      | :p | "left"  | :c |   # :c -> :a 
    | :c | :p | "left"  | :b      | :p | "right" | :c |   # :b -> :c 
    ---------------------------------------------------
    

    There's a repeat of the c -> a -> b path, but that could probably be filtered out.

    If you're actually looking for the set of triples here, and not the particular paths, you can use a construct query which gives you a graph back (since a set of triples is a graph):

    prefix : <urn:ex:>
    
    construct {
      ?s1 ?p ?o1 .
      ?s2 ?q ?o2 .
    }
    where { 
      values ?x { :b :c }
      values ?y { :b :c }
    
      { ?x ?p ?between .
        bind(?x as ?s1)
        bind(?between as ?o1) }
      UNION
      { ?between ?p ?x .
        bind(?between as ?s1)
        bind(?x as ?o1)}
    
      { ?between ?q ?y .
        bind(?between as ?s2)
        bind(?y as ?o2) }
      UNION 
      { ?y ?q ?between .
        bind(?y as ?s2)
        bind(?between as ?o2)}
    }
    
    $ arq --data data.n3 --query paths-construct.sparql
    @prefix :        <urn:ex:> .
    
    <urn:ex:b>
          <urn:ex:p>    <urn:ex:c> ;
          <urn:ex:p>    <urn:ex:d> .
    
    <urn:ex:c>
          <urn:ex:p>    <urn:ex:f> ;
          <urn:ex:p>    <urn:ex:a> .
    
    <urn:ex:a>
          <urn:ex:p>    <urn:ex:b> .