Search code examples
javardfjena

Auto-Detect File Extension with APACHE JENA


I want to convert any file extension to .ttl (TURTLE) and I need to use Apache Jena, I am aware of how it can be accomplished using RDFJ4 but the output isn't as accurate as it is using Jena. I want to know how I can auto-detect the extension or rather file type if I am not aware of the extension when reading a file from a directory. This is my code when I hardcode the file-name, it works, I just need help in auto detecting the file type. My code is as follows:

public class Converter {

public static void main(String[] args) throws FileNotFoundException {

    String fileName = "./abc.rdf";
    Model model = ModelFactory.createDefaultModel();

    //I know this is how it is done with RDF4J but I need to use Apache Jena.
/* RDFParser rdfParser = Rio.createParser(Rio.getWriterFormatForFileName(fileName).orElse(RDFFormat.RDFXML));
       RDFWriter rdfWriter = Rio.createWriter(RDFFormat.TURTLE,
               new FileOutputStream("./"+stripExtension(fileName)+".ttl"));*/

    InputStream is = FileManager.get().open(fileName);
    if (is != null) {
        model.read(is, null, "RDF/XML");
        model.write(new FileOutputStream("./converted.ttl"), "TURTLE");

    } else {
        System.err.println("cannot read " + fileName);
    }
  }
}

All help and advice will be highly appreciated.


Solution

  • There is functionality that handles reading from a file using the extension to determine the syntax:

    RDFDataMgr.read(model, fileName);
    

    It also handles compressed files e.g. "file.ttl.gz".

    There is a registry of languages:

    RDFLanguages.fileExtToLang(...)
    RDFLanguages.filenameToLang(...)
    

    For more control see RDFParser:

    RDFParser.create().
      source(FileName)
      ... many options including forcing the language ...
      .parse(model);
    

    https://jena.apache.org/documentation/io/rdf-input.html