Search code examples
sparqlrdf

Pure-SPARQL migration of data from one endpoint to another?


It looks like this question has been raised before, but subsequently deleted?!

For data in one SQL table, I can easily replicate the structure and then migrate the data to another table (or database?).

CREATE TABLE new_table
  AS (SELECT * FROM old_table);

SELECT *
INTO new_table [IN externaldb]
FROM old_table
WHERE condition; 

Is there something analogous for RDF/SPARQL? Something that combines a select and an insert into one SPARQL statement?

Specifically, I use Karma, which publishes data to an embedded OpenRDF/Sesame endpoint. There's a text box on the GUI for the endpoint, so I can change it to a free-standing RDF4J, since RDF4J is a fork of Sesame.

Unfortunately, I get an error like invalid SPARQL endpoint from Karma when I put the address for a Virtuoso, Stardog or Blazegraph endpoint in the endpoint text box. I suspect it might be possible to modify and recompile Karma, or (more realistically), I could write a small tool with the Jena or RDF4J libraries to select into RAM or scratch disk space and then insert into the other endpoint.

But if there's a pure-SPARQL solution, I'd sure like to hear it.


Solution

  • In SPARQL, you can only specify the source endpoint. Therefore, a partial pure-SPARQL solution would be to run the following update on your target triplestore:

    INSERT { ?s ?p ?o } 
    WHERE { SERVICE <http://source/sparql> 
            { 
               ?s ?p ?o
            }
    }
    

    This will copy over all triples from the (remote) source's default graph to your target store, but it doesn't copy over any named graphs. To copy over any named graphs as well, you can execute this in addition:

    INSERT { GRAPH ?g { ?s ?p ?o } } 
    WHERE { SERVICE <http://source/sparql> 
            { 
              GRAPH ?g {
               ?s ?p ?o
              }
            }
    }
    

    If you're not hung up on pure SPARQL though, different toolkits and frameworks offer you all sorts of options. For example, using RDF4J's Repository API you could just wrap both source and target in a SPARQLRepository proxy (or just use a HTTPRepository if either one is an actual RDF4J store), and then just run copy API operations. There's many different ways to do that, one possible approach (disclaimer: I didn't test this code fragment) is this:

      SPARQLRepository source = new SPARQLRepository("http://source/sparql");
      source.initialize();
      SPARQLRepository target = new SPARQLRepository("http://target/sparql");
      target.initialize();
    
      try (RepositoryConnection sourceConn = source.getConnection(); 
           RepositoryConnection targetConn = target.getConnection()) {
         sourceConn.export(new RDFInserter(targetConn)); 
      }