Search code examples
xmlescapingfix-protocol

Escape control characters in XML 1.0


I understand why control characters are illegal in XML 1.0, but still I need to store them somehow in XML payload and I cannot find any recommendations about escaping them. I cannot upgrade to XML 1.1.

How should I escape e.g. SOH character (\u0001 - standard separator for FIX messages)?

The following doesn't work:

<data>&#x01;</data>

Solution

  • One way is to use processing instructions: <?hex 01?>. But that only works in element content, not in attributes. And of course the processing instruction needs to be understood by the receiving application.

    You could also use elements: <hex value="01"/> but elements are visible in an XSD schema or DTD, while processing instructions are hidden.

    Another approach is that if a piece of payload can contain such characters, then put the whole payload in Base64 encoding.