Search code examples
javaparsingrdf4j

How to parse a big rdf file in rdf4j


I want to parse a huge file in RDF4J using the following code but I get an exception due to parser limit;

public class ConvertOntology {

    public static void main(String[] args) throws RDFParseException, RDFHandlerException, IOException {

        String file =  "swetodblp_april_2008.rdf";
        File initialFile = new File(file);
        InputStream input = new FileInputStream(initialFile);
        RDFParser parser = Rio.createParser(RDFFormat.RDFXML);
        parser.setPreserveBNodeIDs(true); 
        Model model = new LinkedHashModel();
        parser.setRDFHandler(new StatementCollector(model));
        parser.parse(input, initialFile.getAbsolutePath());
        FileOutputStream out = new FileOutputStream("swetodblp_april_2008.nt");
            RDFWriter writer = Rio.createWriter(RDFFormat.TURTLE, out);
        try {
          writer.startRDF();
          for (Statement st: model) {
                    writer.handleStatement(st);
          }
          writer.endRDF();
        }
        catch (RDFHandlerException e) {
        }
        finally {
          out.close();
        }

    }

The parser has encountered more than "100,000" entity expansions in this document; this is the limit imposed by the application.

I execute my code as following as suggested on the RDF4J web site to set up the two parameters (as in the following command)

mvn -Djdk.xml.totalEntitySizeLimit=0 -DentityExpansionLimit=0 exec:java

any help please


Solution

  • The error is due to the Apache Xerces XML parser, rather than the default JDK XML parser. So Just delete Xerces XML folder from you .m2 repository and the code works fine.