Search code examples
javasaxsaxparser

Serious performance issues with Java SAX


I'm building an application that converts an SVG directly to a PDF. Because the SVGs can get quite large and they can be handled linearly, I decided to implement the XML parser with SAX. However, I am not getting the proper performance from the parser: It takes 20 seconds to handle a 45KB SVG file.

Profiling reveals that the CPU hog is in XMLParser's parse method. More specifically, it's taking all this time not to process the data, but just to read it in. It goes all the way down to java.net.SocketInputStream.socketRead0, where the CPU is spending 19 seconds.

Has anyone else had this problem? Does anyone know how to fix it?

The driver I'm using:

// Initialize SAX components
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser saxParser = spf.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();

// Set System L&F
UIManager.setLookAndFeel(
        UIManager.getSystemLookAndFeelClassName());

// Create a new file chooser
JFileChooser fileChooser = new JFileChooser();
fileChooser.setFileFilter(
        new FileNameExtensionFilter("SVG file", "svg"));

// Let the user choose an SVG file and convert it
if (fileChooser.showDialog(null, "Convert")
        == JFileChooser.APPROVE_OPTION) {
    File svgInput = fileChooser.getSelectedFile();
    File pdfOutput =
            new File(svgInput.getPath().replace(".svg", ".pdf"));
    xmlReader.setContentHandler(new SVGToPDFConverter(pdfOutput));
    URL inputURL = new URL(svgInput.toURI().toString());
    System.out.println("Working...");

    // Parse the file
    try (InputStream inputStream = inputURL.openStream()) {
        xmlReader.parse(new InputSource(inputStream));
    }

    System.out.println("Done!");
}

The start of the SVG file I've been testing with:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="fill-opacity:1; color-rendering:auto; color-interpolation:auto; stroke:black; text-rendering:auto; stroke-linecap:square; stroke-miterlimit:10; stroke-opacity:1; shape-rendering:auto; fill:black; stroke-dasharray:none; font-weight:normal; stroke-width:1; font-family:'Dialog'; font-style:normal; stroke-linejoin:miter; font-size:12; stroke-dashoffset:0; image-rendering:auto;" preserveAspectRatio="xMidYMid meet" zoomAndPan="magnify" version="1.0" contentScriptType="text/ecmascript" contentStyleType="text/css">
  <!--Generated by the Batik Graphics2D SVG Generator-->
  <defs id="genericDefs" />
  <g>
    <g style="font-size:14; fill:white; text-rendering:optimizeLegibility; color-rendering:optimizeQuality; image-rendering:optimizeQuality; font-family:'Calibri'; color-interpolation:linearRGB; stroke:white; font-weight:bold;">
      <rect x="68" width="54" height="17" y="2" style="stroke:none;" />
      <text x="73.4" xml:space="preserve" y="15" style="fill:black; stroke:none;">Dates (Ma)</text>
      <rect x="447" width="40" height="17" y="2" style="stroke:none;" />
      <text x="452.7" xml:space="preserve" y="15" style="fill:black; stroke:none;">Composition</text>
      <rect x="604" width="54" height="17" y="2" style="stroke:none;" />
      <text x="609.5001" xml:space="preserve" y="15" style="fill:black; stroke:none;">Isotopic Ratios</text>
      <text x="5" xml:space="preserve" y="32" style="fill:black; stroke:none;" />
      <text x="5" xml:space="preserve" y="43" style="fill:black; stroke:none;" />
      <text x="5" xml:space="preserve" y="54" style="fill:black; stroke:none;">Fraction  </text>
      <line x1="70" x2="70" y1="18" style="fill:none; stroke:gray; stroke-width:0.5;" y2="76" />
      <text x="73.4" y="32" style="fill:black; stroke-width:0.5; stroke:none;" xml:space="preserve" />
      <text x="73.4" y="43" style="fill:black; stroke-width:0.5; stroke:none;" xml:space="preserve">206Pb/</text>
      <text x="73.4" y="54" style="fill:black; stroke-width:0.5; stroke:none;" xml:space="preserve">238U</text>
    </g>

Solution

  • Looks like your parser fetching dtd from "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd"

    You can manually fetch that URL, store it as local file and than set custom handler, where can do dtd resolving.

    I wrote some code, as example.

    @Test
    public void testParser() throws Exception {
        // Initialize SAX components
        Long startTime = System.currentTimeMillis();
    
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser saxParser = spf.newSAXParser();
        File f = new File("/home/grigory/test.svg");
        saxParser.parse(new FileInputStream(f), new MyHandler());
        System.out.println("execution time: " + (System.currentTimeMillis() - startTime));
    }
    
    private static class MyHandler extends DefaultHandler {
    
        @Override
        public InputSource resolveEntity(String publicId, String systemId) throws IOException, SAXException {
            System.out.println("resolve: "+ systemId);
            InputStream is = new FileInputStream("/home/grigory/svg10.dtd");
            return new InputSource(is);
        }
    
        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
            System.out.println("start element '" + qName + "'");
            super.startElement(uri, localName, qName, attributes);
        }
    
        @Override
        public void warning(SAXParseException e) throws SAXException {
            System.out.println(e.getMessage());
            super.warning(e);
        }
    
        @Override
        public void error(SAXParseException e) throws SAXException {
            System.out.println(e.getMessage());
            super.error(e);
        }
    }