Search code examples
sparqljenaarq

Jena StmtIterator and database


I have my model stored in a triple store(persistence). I want to select all individuals related by some document name.

I can do it in 2 ways

1) SPARQL request:

 PREFIX base:<http://example#>
  select ?s2 ?p2 ?o2 
    where {
      {?doc base:fullName <file:/c:/1.txt>; ?p1 ?o1
  } UNION { 
    ?s2 ?p2 ?o2 ;
    base:documentID ?doc    } 
  }

Question: How to create Jena's Model from the ResultSet?

2) StmtIterator stmtIterator = model.listStatements(...)

The problem with this approach that I need to use model.listStatements(...) operation for the several times :

a) get document URI by document name

b) get a list of individuals ID related to this document URI

c) finally - get a collection of individuals

I concern about performance - 3 times run model.listStatements(...) - many database requests.

Or all data are read into memory(I doubt about it) from the database during model creation:

     Dataset ds = RdfStoreFactory.connectDataset(store, conn);
     GraphStore graphStore = GraphStoreFactory.create(ds) ;

?


Solution

  • You need to back up a little bit and think more clearly about what you are trying to do. Your sparql query, once it's corrected (see below), will do a perfectly good job of producing an iterator over the resultset, which will provide you with the properties of each of the documents you're looking for. Specifically, you get one set of bindings for each of s2, p2 and o2 for each value in the resultset. That's what you ask for when you specify select ?s2 ?p2 ?o2. And it's normally what you want: usually, we select some values out of the triple store in order to process them in some way (e.g. rendering them into a list on the UI) and for that we exactly want an iterator over the results. You can have the query return you a model not a resultset, by virtue of a SPARQL construct query or SPARQL describe. However, you then have a need to iterate over the resources in the model, so you aren't much further forward (except that your model is smaller and in-memory).

    Your query, incidentally, can be improved. The variables p1 and o1 make the query engine do useless work since you never use them, and there's no need for a union. Corrected, your query should be:

    PREFIX base:<http://example#>
    
    select ?s2 ?p2 ?o2 
    where {
      ?doc base:fullName <file:/c:/1.txt> .
      ?s2 base:documentID ?doc ;
          ?p2 ?o2 .
    }
    

    To execute any query, select, describe or construct, from Java see the Jena documentation.

    You can efficiently achieve the same results as your query using the model API. For example, (untested):

    Model m = ..... ; // your model here
    String baseNS = "http://example#";
    Resource fileName = m.createResource( "file:/c:/1.txt" );
    
    // create RDF property objects for the properties we need. This can be done in
    // a vocab class, or automated with schemagen
    Property fullName = m.createProperty( baseNS + "fullName" );
    Property documentID = m.createProperty( baseNS + "documentID" );
    
    // find the ?doc with the given fullName
    for (ResIterator i = m.listSubjectsWithProperty( fullName, fileName ); i.hasNext(); ) {
      Resource doc = i.next();
    
      // find all of the ?s2 whose documentID is ?doc
      for (StmtIterator j = m.listStatements( null, documentID, doc ); j.hasNext(); ) {
        Resource s2 = j.next().getSubject();
    
        // list the properties of ?s2
        for (StmtIterator k = s2.listProperties(); k.hasNext(); ) {
          Statement stmt = k.next();
          Property p2 = stmt.getPredicate();
          RDFNode o2 = stmt.getObject();
    
          // do something with s2 p2 o2 ...
        }
      }
    }
    

    Note that your schema design makes this more complex that it needs to be. If, for example, the document full name resource had a base:isFullNameOf property, then you could simply do a lookup to get the doc. Similarly, it's not clear why you need to distinguish between doc and s2: why not simply have the document properties attached to the doc resource?

    Finally: no, opening a database connection does not load the entire DB into memory. However, TDB in particular does make extensive use of caching of regions of the graph in order to make queries more efficient.