Search code examples
listrdfresultsetsparqljena

Select RDF collection/list and iterate result with Jena


For some RDF like this:

<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:blah="http://www.something.org/stuff#">
<rdf:Description rdf:about="http://www.something.org/stuff/some_entity1">
<blah:stringid>string1</blah:stringid>
<blah:uid>1</blah:uid>
<blah:myitems rdf:parseType="Collection">
  <blah:myitem>
        <blah:myitemvalue1>7</blah:myitemvalue1>
        <blah:myitemvalue2>8</blah:myitemvalue2>
     </blah:myitem>
...
    <blah:myitem>
     <blah:myitemvalue1>7</blah:myitemvalue1>
        <blah:myitemvalue2>8</blah:myitemvalue2>
    </blah:myitem>
</blah:myitems>
</rdf:Description>

<rdf:Description rdf:about="http://www.something.org/stuff/some__other_entity2">
<blah:stringid>string2</blah:stringid>
<blah:uid>2</blah:uid>
<blah:myitems rdf:parseType="Collection">
    <blah:myitem>
        <blah:myitemvalue1>7</blah:myitemvalue1>
        <blah:myitemvalue2>8</blah:myitemvalue2>
     </blah:myitem>
....
    <blah:myitem>
        <blah:myitemvalue1>7</blah:myitemvalue1>
        <blah:myitemvalue2>8</blah:myitemvalue2>
    </blah:myitem>
</blah:myitems>
</rdf:Description>
</rdf:RDF>

I'm using Jena/SPARQL and I'd like to be able to use a SELECT query to retrieve the myitems node for an entity with a particular stringid and then extract it from the resultset and iterate through and get the values for each myitem nodes. Order isn't important.

So I have two questions:

  1. Do I need to specify in my query that blah:myitems is a list?
  2. How can I parse a list in a ResultSet?

Solution

  • Selecting Lists (and Elements) in SPARQL

    Let's address the SPARQL issue first. I've modified your data just a little bit so that the elements have different values, so it will be easier to see them in the output. Here's the data in N3 format, which is a bit more concise, especially when representing lists:

    @prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix blah:    <http://www.something.org/stuff#> .
    
    <http://www.something.org/stuff/some_entity1>
          blah:myitems ([ a       blah:myitem ;
                      blah:myitemvalue1 "1" ;
                      blah:myitemvalue2 "2"
                    ] [ a       blah:myitem ;
                      blah:myitemvalue1 "3" ;
                      blah:myitemvalue2 "4"
                    ]) ;
          blah:stringid "string1" ;
          blah:uid "1" .
    
    <http://www.something.org/stuff/some__other_entity2>
          blah:myitems ([ a       blah:myitem ;
                      blah:myitemvalue1 "5" ;
                      blah:myitemvalue2 "6"
                    ] [ a       blah:myitem ;
                      blah:myitemvalue1 "7" ;
                      blah:myitemvalue2 "8"
                    ]) ;
          blah:stringid "string2" ;
          blah:uid "2" .
    

    You mentioned in the question selecting the myitems node, but myitems is actually the property that relates the entity to the list. You can select properties in SPARQL, but I'm guessing that you actually want to select the head of the list, i.e., the value of the myitems property. That's straightforward. You don't need to specify that it's an rdf:List, but if the value of myitems could also be a non-list, then you should specify that you're only looking for rdf:Lists. (For developing the SPARQL queries, I'll just run them using Jena's ARQ command line tools, because we can move them to the Java code easily enough afterward.)

    prefix blah: <http://www.something.org/stuff#> 
    
    select ?list where { 
      [] blah:myitems ?list .
    }
    
    $ arq --data data.n3 --query items.sparql
    --------
    | list |
    ========
    | _:b0 |
    | _:b1 |
    --------
    

    The heads of the lists are blank nodes, so this is the sort of result that we're expecting. From these results, you could get the resource from a result set and then start walking down the list, but since you don't care about the order of the nodes in the list, you might as well just select them in the SPARQL query, and then iterate through the result set, getting each item. It also seems likely that you might be interested in the entity whose items you're retrieving, so that's in this query too.

    prefix blah:    <http://www.something.org/stuff#> 
    prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    
    select ?entity ?list ?item ?value1 ?value2 where { 
      ?entity blah:myitems ?list .
      ?list rdf:rest* [ rdf:first ?item ] .
      ?item a blah:myitem ;
            blah:myitemvalue1 ?value1 ;
            blah:myitemvalue2 ?value2 .
    }
    order by ?entity ?list
    
    $ arq --data data.n3 --query items.sparql
    ----------------------------------------------------------------------------------------
    | entity                                               | list | item | value1 | value2 |
    ========================================================================================
    | <http://www.something.org/stuff/some__other_entity2> | _:b0 | _:b1 | "7"    | "8"    |
    | <http://www.something.org/stuff/some__other_entity2> | _:b0 | _:b2 | "5"    | "6"    |
    | <http://www.something.org/stuff/some_entity1>        | _:b3 | _:b4 | "3"    | "4"    |
    | <http://www.something.org/stuff/some_entity1>        | _:b3 | _:b5 | "1"    | "2"    |
    ----------------------------------------------------------------------------------------
    

    By ordering the results by entity and by list (in case some entity has multiple values for the myitems property), you can iterate through the result set and be assured of getting, in order, all the elements in a list for an entity. Since your question was about lists in result sets, and not about how to work with result sets, I'll assume that iterating through the results isn't a problem.

    Working with Lists in Jena

    The following example shows how you can work with lists in Java. The first part of the code is just the boilerplate to load the model and run the SPARQL query. Once you're getting the results of the query back, you can either treat the resource as the head of a linked list and use the rdf:first and rdf:rest properties to iterate manually, or you can cast the resource to Jena's RDFList and get an iterator out of it.

    import java.io.IOException;
    import java.io.InputStream;
    
    import com.hp.hpl.jena.query.QueryExecutionFactory;
    import com.hp.hpl.jena.query.QuerySolution;
    import com.hp.hpl.jena.query.ResultSet;
    import com.hp.hpl.jena.rdf.model.Model;
    import com.hp.hpl.jena.rdf.model.ModelFactory;
    import com.hp.hpl.jena.rdf.model.Property;
    import com.hp.hpl.jena.rdf.model.RDFList;
    import com.hp.hpl.jena.rdf.model.RDFNode;
    import com.hp.hpl.jena.rdf.model.Resource;
    import com.hp.hpl.jena.util.iterator.ExtendedIterator;
    import com.hp.hpl.jena.vocabulary.RDF;
    
    public class SPARQLListExample {
        public static void main(String[] args) throws IOException {
            // Create a model and load the data
            Model model = ModelFactory.createDefaultModel();
            try ( InputStream in = SPARQLListExample.class.getClassLoader().getResourceAsStream( "SPARQLListExampleData.rdf" ) ) {
                model.read( in, null );
            }
            String blah = "http://www.something.org/stuff#";
            Property myitemvalue1 = model.createProperty( blah + "myitemvalue1" );
            Property myitemvalue2 = model.createProperty( blah + "myitemvalue2" );
    
            // Run the SPARQL query and get some results
            String getItemsLists = "" +
                    "prefix blah: <http://www.something.org/stuff#>\n" +
                    "\n" +
                    "select ?list where {\n" +
                    "  [] blah:myitems ?list .\n" +
                    "}";
            ResultSet results = QueryExecutionFactory.create( getItemsLists, model ).execSelect();
    
            // For each solution in the result set
            while ( results.hasNext() ) {
                QuerySolution qs = results.next();
                Resource list = qs.getResource( "list" ).asResource();
                // Once you've got the head of the list, you can either process it manually 
                // as a linked list, using RDF.first to get elements and RDF.rest to get 
                // the rest of the list...
                for ( Resource curr = list;
                      !RDF.nil.equals( curr );
                      curr = curr.getRequiredProperty( RDF.rest ).getObject().asResource() ) {
                    Resource item = curr.getRequiredProperty( RDF.first ).getObject().asResource();
                    RDFNode value1 = item.getRequiredProperty( myitemvalue1 ).getObject();
                    RDFNode value2 = item.getRequiredProperty( myitemvalue2 ).getObject();
                    System.out.println( item+" has:\n\tvalue1: "+value1+"\n\tvalue2: "+value2 );
                }
                // ...or you can make it into a Jena RDFList that can give you an iterator
                RDFList rdfList = list.as( RDFList.class );
                ExtendedIterator<RDFNode> items = rdfList.iterator();
                while ( items.hasNext() ) {
                    Resource item = items.next().asResource();
                    RDFNode value1 = item.getRequiredProperty( myitemvalue1 ).getObject();
                    RDFNode value2 = item.getRequiredProperty( myitemvalue2 ).getObject();
                    System.out.println( item+" has:\n\tvalue1: "+value1+"\n\tvalue2: "+value2 );
                }
            }
        }
    }