I have a requirement to retain some XML data exactly as it is received from another system. Here is an example of what we are receiving:
<SomeTag display="1 2 3 4 5 <anotherTag>someValue</anotherTag>" />
When this is read in and then saved to our DB it is saved like this:
<SomeTag display="1 2 3 4 5 <anotherTag>someValue</anotherTag>" />
I want to preserve the data exactly as is without it encoding the > sign.
If you only have the StAX events then no, there's no way to achieve this, because when you get an attribute value of
1 2 3 4 5 <anotherTag>someValue</anotherTag>
from the StAX parser there's no way to know what the original bytes looked like - the greater than signs might have been >
or >
or >
, all these forms will produce the same value when parsed and any XML parser must treat them as equivalent.
If you care about the precise original representation then you'll have to do this outside the XML world - save the original bytes somehow before they are parsed by the StAX parser, decode them using the correct character encoding, and store the resulting string directly into your database.