Search code examples
sparqljenatdb

Does Jena TDB load all data into memory every time?


I am a newbie of Jena. I try to deal with the Yoga dataset using TDB. The dataset is about 200M and everytime I run the same query, it will have to take about 5 minutes to load the data then give out the results. I am wondering do I misunderstand any part of TDB? The following are my codes.

String directory = "tdb";
Dataset dataset = TDBFactory.createDataset(directory);      
dataset.begin(ReadWrite.WRITE);
Model tdb = dataset.getDefaultModel();
//String source = "yagoMetaFacts.ttl";
//FileManager.get().readModel(tdb, source);
String queryString = "SELECT DISTINCT ?p WHERE { ?s ?p ?o. }";
Query query = QueryFactory.create(queryString);
try(QueryExecution qexec = QueryExecutionFactory.create(query, tdb)){
    ResultSet results = qexec.execSelect();
    ResultSetFormatter.out(System.out, results, query) ;
}
dataset.commit();    
dataset.end();

Solution

  • There are two ways to load data into tdb, either by API or CMD. Much thanks to @ASKW and @AndyS

    1 Load data via API

    These codes need to be executed only once especially the readModel line which will takes long time.

    String directory = "tdb";
    Dataset dataset = TDBFactory.createDataset(directory);      
    dataset.begin(ReadWrite.WRITE);
    Model tdb = dataset.getDefaultModel();
    String source = "yagoMetaFacts.ttl";
    FileManager.get().readModel(tdb, source);
    dataset.commit(); //Important!! This is to commit the data to tdb.   
    dataset.end();
    

    After the data is loaded into tdb, we can use following codes to query. And it is not necessary to load data again.

    String directory = "path\\to\\tdb"; 
    Dataset dataset = TDBFactory.createDataset(directory);
    Model tdb = dataset.getDefaultModel(); 
    String queryString = "SELECT DISTINCT ?p WHERE { ?s ?p ?o. }"; 
    Query query = QueryFactory.create(queryString);
    try(QueryExecution qexec = QueryExecutionFactory.create(query, tdb)){
         ResultSet results = qexec.execSelect();
         ResultSetFormatter.out(System.out, results, query) ;
    }
    

    2 Load data via CMD

    To load data

    >tdbloader --loc=path\to\tdb path\to\dataset.ttl
    

    To query

    >tdbquery --loc=path\to\tdb --query=q1.rq
    

    q1.rq is the file which stores the query Should get results like this

    -------------------------------------------------------
    | p                                                   |
    =======================================================
    | <http://yago-knowledge.org/resource/hasGloss>       |
    | <http://yago-knowledge.org/resource/occursSince>    |
    | <http://yago-knowledge.org/resource/occursUntil>    |
    | <http://yago-knowledge.org/resource/byTransport>    |
    | <http://yago-knowledge.org/resource/hasPredecessor> |
    | <http://yago-knowledge.org/resource/hasSuccessor>   |
    | <http://www.w3.org/2000/01/rdf-schema#comment>      |
    -------------------------------------------------------