Search code examples
javaapache-axisaxiom

Axiom getTextAsStream without caching Parser has already reached end of the document. No siblings found


I'm using Axiom in Axis2 to extract the text from a large base64Binary section of the SOAP message. My receiver is not using MTOM, and uses OMElement.getTextAsStream( false ) to extract the text. The code looks something like this:

final Iterator<OMElement> childrenIterator = uploadFile.getChildElements();
while ( childrenIterator.hasNext() )
{
    final OMElement element = childrenIterator.next();
    if ( "fileID".equals( element.getLocalName() ) )
    {
        fileID = element.getText();
    }
    // fileContent contains a large base64Binary block
    else if ( "fileContent".equals( element.getLocalName() ) )
    {
        Reader reader = element.getTextAsStream( false );

        final char[] buf = new char[BUFFER_SIZE];
        int len = 0;
        while ( (len = reader.read( buf ) ) >= 0 )
        {
            if ( len > 0 )
            {
                // Process chunk here
            }
        }
    }
}

A sample XML would look like

<uploadFile>
    <fileID>id</fileID>
    <fileContent>~500kB of base64 data</fileContent>
</uploadFile>

I'm getting this exception on the childrenIterator.hasNext() line after the base64Binary data has been read:

Caused by: org.apache.axiom.om.OMException: Parser has already reached end of the document. No siblings found
    at org.apache.axiom.om.impl.llom.OMElementImpl.getNextOMSibling(OMElementImpl.java:359)
    at org.apache.axiom.om.impl.traverse.OMChildrenIterator.getNextNode(OMChildrenIterator.java:36)
    at org.apache.axiom.om.impl.traverse.OMAbstractIterator.hasNext(OMAbstractIterator.java:69)
    at org.apache.axiom.om.impl.traverse.OMFilterIterator.hasNext(OMFilterIterator.java:54)

I've done some investigating, and it's definitely related to the fact that I'm setting cache to false when calling getTextAsStream(). I need to do this because the potential size of the base64 data could be hundreds of megabytes.

The problem seems to be that TextFromElementReader advances the underlying XMLStreamReader to the END_ELEMENT event. OMElementImpl.getNextOMSibling() then calls next() on the underlying XMLStreamReader and gets the END_DOCUMENT event. It seems like the TextFromElementReader needs to encounter the END_ELEMENT to know that it has reached the end of the text segment, but this leaves the underlying XMLStreamReader in the wrong state for OMElementImpl.getNextOMSibling().

Has anyone seen this error before? Is it something wrong with the way I'm using Axiom?


Solution

  • I ended up not using getTextAsReader at all. Instead, I iterated through the child text nodes and processed the text content in chunks that way. The parser is configured to be non-coalescing, so I get reasonably sized text nodes rather than one big one.

    OMNode child = omElement.getFirstOMChild();
    while ( child != null )
    {
        if ( child instanceof OMText )
        {
            // process 'child' text here
    
            final OMNode nextSibling = child.getNextOMSibling();
            child.detach();    // detach from OM to keep memory usage low
            child = nextSibling;
        }
    }