Search code examples
datasetsparqlrdfjenatriplestore

Is it possible to store an in-memory Jena Dataset as a triple-store?


Warning! This question is a catch, I bring 0 XP considering RDF systems, so I couldn't express this in a single question. Feel free to skip the first two paragraphs.

What I'm trying to build, overall
I'm currently building a Spring app that will be the back-end for a system that will gather measurements.
I want to store the info in a triple-store instead of an RDBMS.
So, you may imagine a Spring Boot app with the addition of the Jena library.

The workflow of the system
About the methodology that I'm planning to deploy.
1. Once the App is up and running it would either create or connect to an existing triple-store database.
2. A POST request reaches an app controller.
3. I use SPARQL query to insert the new entry to the triple-store.
4. Other Controller/Service/DAO methods exist to serve GET requests for SELECT queries on the triple-store.

*The only reason I provided such a detailed view of my final goal is to avoid answers that would call my question a XY-problem.

The actual problem
1. Does a org.apache.jena.query.Dataset represent an in memory triple-store or is this kind of Dataset a completely different data structure?
2. If a Dataset is indeed a triple-store, then how can I store this in-memory Dataset to retrieve it in a later session?
3. If indeed one can store a Dataset, then what are the options? Is the default storing a Dataset as a file with .tdb extension? If so then what is the method for that and under which class?
4. If so far I am correct in my guess then would the assemble method be sufficient to "retrieve" the triple-store from the file stored? 5. Do all triple-store databases follow this concept, of being stored in .tdb files?


Solution

  • org.apache.jena.query.Dataset is an interface - there are multiple implementations with different characteristics.

    DatasetFactory makes datasets of various kinds. DatasetFactory.createTxnMem is an in-memory, transactional dataset. It can be initialized with the contents of files but updates do not change the files.

    An in-memory only exists for the JVM-session.

    If you want data and data changes to persist across sessions, you can use TDB for persistent storage. Try TDBFactory or TDB2Factory

    TDB (TDB1 or TDB2) are triplestore databases.


    Fuseki is the triple store server. You can send SPARQL requests to Fuseki (query, update, bulk upload, ...)

    You can start Fuseki with a TDB database (it creates if it does not exist)

    fuseki-server -tdb2 --loc DB /myData

    ".tdb" isn't a file extension Apache Jena uses. Databases are a directory of files.