Search code examples
javaparsingjacksonxml-parsingjackson-dataformat-xml

How to stop Jackson from parsing an element?


I have a XML Document where there are nested tags that should not be interpreted as XML tags

For example something like this <something>cba<a href="linktosomething.com">abc</a></something> should be parsed as a plain String "cba<a href="linktosomething.com">abc</a>" (it should be mentioned that the document has other elements as well that get parsed just fine). Jackson tho tries to interpret it as an Object and I don't know how to prevent this. I tried using @JacksonXmlText, turning off wrapping and a custom Deserializer, but I didn't get it to work.


Solution

  • The <a should be translated to &lt;a. This back and forth conversion normally happens with every XML API, setting and getting text will use those entities &...;.

    An other option is to use an additional CDATA section: <![CDATA[ ... ]]>.

    <something><![CDATA[cba<a href="linktosomething.com">abc</a>]]></something>
    

    If you cannot correct that, and have to live with an already corrupted XML text, you must do your own hack:

    1. Load the wrong XML in a String
    2. Repair the XML
    3. Pass the XML string to jackson

    Repairing:

    String xml = ...
    xml = xml.replaceAll("<(/?a\\b[^>]*)>", "&lt;$1&gt;"); // Links
    StringReader in = new StringReader(xml);