Search code examples
javaxmlcdatastaxwoodstox

Stax considers Text+CDATA+Text to be a single CHARACTERS section


Using Stax, I'm surprised to find that an XML block such as:

<badger>
    <![CDATA[Text about a badger]]>
</badger>

is treated as if it were:

START_ELEMENT (badger)
CHARACTERS (        Text about a badger    )
END_ELEMENT (badger)

That is, the CDATA and the surrounding text are flattened into one text element. There is no CDATA element detected.

Is this correct behaviour? How can I separate the whitespace from the CDATA?

I am using the woodstox implementation.


Solution

  • I don't know about the woodstox implementation, but could this bug, resolved in 2006, still be a factor? Are you setting the optional report-cdata-event property?

    (See also this message about a similar problem.)