I've been given an xml string which I need to put through a parser. Its currently complaining because of an illegal xml character. Very simplified example:
<someXml>this & that</someXml>
I know that the solution is to replace &
with &
, but I'm not generating the XML and therefore have no control over the values.
A simple string replace is not the right way to to this since the '&' has special meaning in XML and a global replace of '&' with '&' would ruin the special meaning which was intended. Is there a solution to take a full xml document and 'fix' it so that '&' become '&', but only where intended? Am I safe to globally replace ' & ' with ' & ' (note the spaces on either side)?
I think this an interesting question, because it's a situation that may really happen in real-life. Although I believe that the right thing to do is asking the XML provider to fix the XML and make it valid, I thought one option was trying with a lenient parser. I did some search and I found this blog post talking about this same problem, and suggesting the same solution that I was think of. You may try with jsoup. Let me repeat that I think this is not the best thing to do: you should really ask the XML provider to fix it.