Search code examples
javahtmlsaxsaxon

Getting NULL pointer exception net.sf.saxon.event.ReceivingContentHandler.startElement in DaisyDiff


I'm using DaizyDIff library to compare two html files. I wrote a java code to implement the DaizyDiff. but while running I'm getting NULL pointer exception on net.sf.saxon.event.ReceivingContentHandler.startElement

I have tries multiple approach on SAXTransformerFactory , but I couldn't figure out

public static void daisyDiffTest() throws Exception {
    String html1 = "<html><body>var v2</body></html>";
    String html2 = "<html>  \n  <body>  \n  Hello world  \n  </body>  \n  </html>";

    try {
        StringWriter finalResult = new StringWriter();
        SAXTransformerFactory tf = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
        TransformerHandler result = tf.newTransformerHandler();
        result.getTransformer().setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        result.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
        result.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
        result.getTransformer().setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        result.setResult(new StreamResult(finalResult));

        ContentHandler postProcess = result;
        Locale val = Locale.ENGLISH;
        DaisyDiff.diffHTML(new InputSource(new StringReader(html1)), new InputSource(new StringReader(html2)),
                postProcess, "test", val);
        System.out.println(finalResult.toString());
    } catch (SAXException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

Expected result would be diff in the HTML file.


Solution

  • It's hard to know without knowing what DaisyDiff is, or what calls it makes. It's quite possible that it's not tested or supported for use with Saxon.

    The format of data passed to the startElement() event in a SAX ContentHandler depends on the configuration options of the XML parser, and the problem when Saxon is invoked as a ContentHandler in this way is that it has no way of discovering what configuration options the parser is using.

    As stated in the Javadoc documentation here: http://www.saxonica.com/documentation/index.html#!javadoc/net.sf.saxon.event/ReceivingContentHandler@startElement if the events emitted by the parser don't correspond to what an appropriately configured parser would emit, the ReceivingContentHandler will fail in unpredictable ways.

    Posting the stack trace of the exception might be useful.