Search code examples
convertersrdfjenardfsapache-jena

For Apache Jena input: Conversion from CSV to RDF Format


I am going to use Apache Jena and it takes RDF as the input format. But I've data in the CSV format. I researched a lot and couldn't find a way to convert it. Does anyone know how to do that efficiently.

I have gone thru tools like xml123 but the download link wasn't working.


Solution

  • Using jena-arq and jena-csv (both v3.0.1) the following way is working for me:

    public static void main(String ... strings) throws Exception {
        CSV2RDF.init();
        //load through manager:
        //Model m = RDFDataMgr.loadModel("test.csv") ;
        //classic way to load:
        Model m = ModelFactory.createDefaultModel();
        try (InputStream in = JenaCSVTest.class.getResourceAsStream("/test.csv")) {
            m.read(in, "http://example.com", "csv");
        }
        m.setNsPrefix("test", "http://example.com#");
        m.write(System.out, "ttl");
    }
    

    The input (test.csv):

    Town,Population
    Southton,123000
    Northville,654000
    

    The output (rdf in turtle):

    @prefix test:  <http://example.com#> .
    
    [ test:Population  "123000"^^<http://www.w3.org/2001/XMLSchema#double> ;
      test:Town        "Southton" ;
      <http://w3c/future-csv-vocab/row>
              1
    ] .
    
    [ test:Population  "654000"^^<http://www.w3.org/2001/XMLSchema#double> ;
      test:Town        "Northville" ;
      <http://w3c/future-csv-vocab/row>
              2
    ] .
    

    See official doc jena-csv

    UPDATE:

    Starting jena-3.10.0 jena-csv has been retired. The last jena-csv release is 3.9.0. Instead you can use any other csv2rdf converters. For example, tarql.

    A quick demonstration example for com.github.tarql:tarql version v1.2 (obtained through jitpack.io - it seems, there is no maven-central release):

        Path file = Paths.get(JenaCSVTest.class.getResource("/test.csv").toURI());
        String base = "http://example.com#";
        Model m = ModelFactory.createDefaultModel().setNsPrefix("xsd", XSD.getURI()).setNsPrefix("test", base);
        Graph g = m.getGraph();
        CSVOptions op = new CSVOptions();
        op.setDefaultsForCSV();
        String query = "PREFIX test: <" + base + ">\n" +
                "PREFIX xsd: <" + XSD.getURI() + ">\n" +
                "CONSTRUCT {\n" +
                "  ?Row a test:Row;\n" +
                "    test:town ?town;\n" +
                "    test:population ?population;\n" +
                "} \n" +
                "WHERE {\n" +
                "  BIND (BNODE() AS ?Row)\n" +
                "  BIND (xsd:string(?Town) AS ?town)\n" +
                "  BIND (xsd:integer(?Population) AS ?population)\n" +
                "}";
        TarqlQuery q = new TarqlQuery(QueryFactory.create(query));
        InputStreamSource src = InputStreamSource.fromFilenameOrIRI(file.toUri().toString());
        TarqlQueryExecution qe = TarqlQueryExecutionFactory.create(q, src, op);
        qe.execTriples().forEachRemaining(g::add);
        m.write(System.out, "ttl");
    

    This snippet will generate the following RDF:

    @prefix test:  <http://example.com#> .
    @prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
    
    [ a                test:Row ;
      test:population  123000 ;
      test:town        "Southton"
    ] .
    
    [ a                test:Row ;
      test:population  654000 ;
      test:town        "Northville"
    ] .