Search code examples
sparqljenatdbarqnamed-graphs

tbloader vs SPARQL INSERT - Why different behaviour with named graphs?


There is a strange behaviour in the connection of the commandline tools of ARQ, TDB and Named Graphs. If importing data via tdbloader in a named graph it can not be queried via GRAPH clause in a SPARQL SELECT query. However, this query is possible when inserting the data in the same graph with SPARQL INSERT.

I have following assembler description file tdb.ttl:

@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:     <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .


[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

[] rdf:type         tdb:DatasetTDB ;
    tdb:location "DB" ;
.

There is a dataset in the file data.ttl:

<a> <b> <c>.

Now, I am inserting this data with tdbloader and secondly another triple with SPARQL INSERT, both in the named graph data:

tdbloader --desc tdb.ttl --graph data data.ttl
update --desc tdb.ttl "INSERT DATA {GRAPH <data> {<d> <e> <f>.}}"

Now, the data can be queried with SPARQL via:

$arq --desc tdb.ttl "SELECT *  WHERE{ GRAPH ?g {?s ?p ?o.}}"
----------------------------
| s   | p   | o   | g      |
============================
| <a> | <b> | <c> | <data> |
| <d> | <e> | <f> | <data> |
----------------------------

Everything seems perfect. But now I want to query only this specifc named graph data:

$ arq --desc tdb.ttl "SELECT *  WHERE{ GRAPH <data> {?s ?p ?o.}}"
-------------------
| s   | p   | o   |
===================
| <d> | <e> | <f> |
-------------------

Why is the data imported from tdbloader missing? What is wrong with this query? How can I get results back from both imports?


Solution

  • Try this query:

    PREFIX : <data>
    SELECT * { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } }
    

    and the output is

    ----------------------------
    | s   | p   | o   | g      |
    ============================
    | <a> | <b> | <c> | <data> |
    | <d> | <e> | <f> | :      |
    ----------------------------
    

    or try:

     tdbquery --loc DB --file Q.rq -results srj
    

    to get the results in a different form.

    The text output is makign things look nice but two different things end up as <data>.

    What you are seeing is that

    tdbloader --desc tdb.ttl --graph data data.ttl
    

    used data exactly as is to name the graph. But

    INSERT DATA {GRAPH <data> {<d> <e> <f>.}}
    

    does a full SPARQL parse, and resolves against the base URI, probably looking like file://*currentdirectory*.

    When printing in text, URIs get abbreviated, including using the base. So both the original data (from tdbloader) and file:///path/data appear as <data>.

    PREFIX : <data>
    

    gives the text output a different way to write it as :.

    Finally try:

    BASE <http://example/>
    SELECT * { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } }
    

    which sets the base URI to something no where near your data URIs so switching off nice formatting by base URI:

    ----------------------------------------------------------------------------------------------------------------
    | s                        | p                        | o                        | g                           |
    ================================================================================================================
    | <file:///home/afs/tmp/a> | <file:///home/afs/tmp/b> | <file:///home/afs/tmp/c> | <data>                      |
    | <file:///home/afs/tmp/d> | <file:///home/afs/tmp/e> | <file:///home/afs/tmp/f> | <file:///home/afs/tmp/data> |
    ----------------------------------------------------------------------------------------------------------------