Search code examples
sparqlrdf

SPARQL - get nodes with maximum distance 2 to current node


Given a specific node http://my.org/nodes#n1 in an RDF database, I would like to get all nodes that are linked to n1 (through a variety of predicates) and then again - if possible - get all nodes that are connected to those nodes.

For example, given the following graph about persons and whom they are friends with:

@prefix p: <http://helloworld.org/person#> .
@prefix r: <http://helloworld.org/relation#> .
@prefix l: <http://helloworld.org/location#> .

l:l1 l:city "Sydney" .
l:l2 l:city "Paris" .
l:l3 l:city "New York" .

p:p1 
    p:name "Jack" ;
    p:age 30 ;
    p:livingIn l:l1 .
p:p2 
    p:name "Peter" ;
    p:age 31 ;
    p:livingIn l:l1 .
p:p3 
    p:name "Carol" ;
    p:age 32 ;
    p:livingIn l:l1 .
p:p4 
    p:name "Anna" ;
    p:age 33 ;
    p:livingIn l:l2 .
p:p5 
    p:name "Chris" ;
    p:age 34 ;
    p:livingIn l:l3 .

p:p1 
    r:isFriendOf p:p2 ;
    r:isFriendOf p:p3 .

p:p3 r:isFriendOf p:p4 .

p:p4 r:isFriendOf p:p5 .

I know there is just one relation between nodes (isFriendOf) and hence it's easy to query Jack's friends and their friends

PREFIX p: <http://helloworld.org/person#>
PREFIX r: <http://helloworld.org/relation#>

SELECT ?name ?friend ?foaf
WHERE {
    ?p r:isFriendOf ?o .    
    ?p p:name ?name .
    ?o p:name ?friend .
    OPTIONAL { 
        ?o r:isFriendOf ?fo .
        ?fo p:name ?foaf
    }
    FILTER ( ?p = p:p1 )
}
name friend foaf
"Jack" "Peter"
"Jack" "Carol" "Anna"

However, for much more complex graphs with many relationships, I am struggling to get to the result. I started with

PREFIX id: <http://my.org/graph#>

SELECT ?s ?p ?o ?oo
WHERE { 
    ?s ?p ?o . 
    OPTIONAL { ?o ?p ?oo . } # if an object has connections, get those as well
    FILTER( ?s = id:12345 )  # id:12345 is the node we focus on
}

but the { ?o ?p ?oo } part is not giving me anything..


Solution

  • The node you are filtering by (id:12345) does not exist in the graph you posted.

    The query you are using has two problems: 1) you are restricting the property between the resource you are interested in (id:12345) and the object (o) to be the same as the one between ?o and ?oo, 2) query is inefficient, as you are retrieving the full graph to then filter by the node of interest. Instead, you could do something like:

    PREFIX id: <http://my.org/graph#>
    
    SELECT ?s ?p ?o ?p1 ?oo
    WHERE { 
        id:12345 ?p ?o . 
        OPTIONAL { ?o ?p1 ?oo . } 
    }
    

    If you want an example, try the following in https://dbpedia.org/sparql

    select ?p ?o ?p1 ?oo where{
    <http://dbpedia.org/resource/Madrid> ?p ?o
    optional {?o ?p1 ?oo}
    }
    

    The first result is something like:

    p   o   p1  oo
    http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://dbpedia.org/ontology/Place   http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://www.w3.org/2002/07/owl#Class
    

    Which makes sense: Madrid is a place, and place is a class.

    Note that this gets only the outgoing links, not the incoming links to the node of interest. For that you would have to do something like:

    select ?s1 ?p1 ?s ?p where{
         ?s ?p <http://dbpedia.org/resource/Madrid>
         optional {?s1 ?p1 ?s}
        }