Search code examples
javaxmlstreamreader

Java XMLStreamReader converts " to "


Suppose, we have the following XML

<Test> <Description> &quot;Hi&quot; </Description> </Test>

I load this XML using XMLStreamReader and parse using the reader object. When I print the characters encountered while parsing using the getText() of the reader, I see that the &quot; is printed as ". Although, "(double-quotes) need not have been escaped to &quot; in the first place, I would like to know why the parser automatically does this conversion when the escaping is not required. For instance, &lt;, &gt; and &amp; are preserved, without which the resulting XML would be invalid. However, this is not the case for &quot; and &apos;. I have to save the description the same way I receive it. Is it possible to do that with the XMLStreamReader API?


Solution

  • I have to save the description the same way I receive it.

    You should not. As far as XML is concerned, &quot; or " are the exact same thing, and therefore it cannot matter to you whether you obtain one or the other.

    As for why it's happening, it is an XML parser's job to unescape escaped characters so that they present you with the data they mean. It also unescapes &lt; and so on. However, when the text such obtained is then serialized back into XML, the serializer will escape again characters such as < because it's required by XML, but it won't bother escaping " because that's not necessary.

    When you go through a process of parsing XML, then serializing again, you cannot have a concept of "preserving" the escapes as-is. That's inherently lost in conversion. The parser just is not in charge of preserving this unneeded info. However, if you wish your " to always be escaped to &quot; in the resulting XML, your XML serializer might have an option for that (you gave no details about what you're using, so I can't tell you definitely whether you can or cannot.)