Search code examples
javaxmlpartialbytebufferfilechannel

Java XML parsing


I have a file that has several XML documents like below in sequence.

<?xml version="1.0"?><Node>...<Node>...</Node>...</Node><?xml version...

which repeats several times.

I use Java, I have a FileChannel opened for the file and I have a byte buffer to read. Would appreciate if there is a built in way or an easier way or an already solved way to do a partial parsing of XML bytes with Java. For example like this:

FooParser parser = new FooParser();

while (...)
{
    buffer.flip();
    parser.parse(buffer);
    buffer.compact();
    if (parser.done())
    {
        xmlDocs.add(parser.xml());
        parser.reset();
    }
    file.read(buffer);
    ...
}

Solution

  • There's nothing in the api that I know of that will parse multiple xml docs in a single stream. I think you're going to have to scan for the <?xml ... tags yourself and split up the input. The parser won't know that it's hit the next xml document until it reads the tag. At that point it will choke and the opening tag for the next xml doc will have already been read.

    Actually, now that you mention it, you may be able to use a pull parser to do what you want. But I'm pretty sure the SAX and DOM parsers in the api won't do what you want.