Search code examples
javaxml-parsingstax

StAX incorrectly reading the event reader


I have an xml as

<?xml version="1.0" encoding="UTF-8"?>
<Batch checksum="4f4e96dc724aa38965a75121b654a49830fc6a46f927f1b1803cdd22a40f1e27">
<UpdateCache incremental="false" cleanCache="true">
    <PayLoad>
        <ConfigurationChangeSet />
        <CacheContent Type="ReportingServices" Name="ReportingServices-conf.xml" Path="roles/ReportingServices" EntryName="ReportingServices" EntryId="175">
            &lt;RPTSVC role:instanceID="765002" role:roleName="ReportingServices" role:Identity="175"&gt;
            &lt;SSRSServerDetails&gt;
            &lt;SSRS_PORT&gt;29283&lt;/SSRS_PORT&gt;
            &lt;SSRS_SSL_PORT&gt;29284&lt;/SSRS_SSL_PORT&gt;
        </CacheContent>
    </PayLoad>
</UpdateCache>

I am working on a project that uses stax to strip up the content of the tag into a seperate file. The issue is that the stax parser is only able to strip out '<' when it encounter < and then reports it as done.

Here is the code I am using :

while(reader.hasNext()){
        XMLEvent event = reader.nextEvent();
        if(event.isStartElement()){
            StartElement startElement = event.asStartElement();

            if(startElement.getName().getLocalPart().equals("CacheContent")){
                XMLEvent cacheContentXmlEvent = reader.peek();
                if(cacheContentXmlEvent.isCharacters()){
                    Characters cacheContentXmlText = cacheContentXmlEvent.asCharacters();

                    System.out.println(String.format("Cache Content ==> %s " , cacheContentXmlText.getData())) ;
                }
            }
        }
    }

What am I doing wrong? Thanks for your help.


Here is the new code(may not be bug free yet)

while(reader.hasNext()){
        XMLEvent event = reader.nextEvent();
        if(event.isStartElement()){
            StartElement startElement = event.asStartElement();
            if(startElement.getName().getLocalPart().equals("CacheContent")){

                XMLEvent cacheContentXmlEvent = reader.peek();
                if(cacheContentXmlEvent.isCharacters()){
                    StringBuilder stringBuilder = new StringBuilder();
                    while(reader.hasNext()){
                        XMLEvent xmlCharEvent = reader.nextEvent();
                        if(xmlCharEvent.isCharacters()){
                            stringBuilder.append(xmlCharEvent.asCharacters().getData());
                        }
                        if(reader.peek().isCharacters()){
                            continue;
                        }else{
                            break;
                        }
                    }
                    System.out.println(stringBuilder.toString());
                }
            }
        }
    }

Solution

  • There can be multiple text nodes under a given parent. You'll probably find that if you loop and keep checking isCharacters() that you'll get all the text eventually.