Search code examples
javadomsaxjaxp

Java (JAXP) XML parsing differences of DocumentBuilder


Is there any kind of difference between

  1. DocumentBuilder.parse(InputStream) and
  2. DocumentBuilder.parse(InputSource) ?

I could only find that for the first case, the parser detects the encoding from the stream so it is safer while in the latter I am not sure if it is required to set the encoding.

Any other points (e.g. performance) I should be aware?


Solution

  • The main difference is that the first one allows you to read your XML content only from binary sources, based on the implementations of the InputStream interface. I.e: directly from a file (using a FileInputStream), an open Socket (from Socket.getInputStream()), etc.

    The second one, DocumentBuilder.parse(InputSource), allows you to read data from binary sources too (this is, an InputStream impl) and from character sources (Reader implementations). So, with this one you can use an XML String (using a StringReader), or a BufferedReader.

    While with the second method you already have the chance to handle InputStreams, the first one is a kind of shortcut, so when you have an InputStream you don't need to create a new InputSource. In fact, the source code of the InputStream method is:

    public Document parse(InputStream is)
        throws SAXException, IOException {
        if (is == null) {
            throw new IllegalArgumentException("InputStream cannot be null");
        }
    
        InputSource in = new InputSource(is);
        return parse(in);
    }