Is it correct to escape "&", ">" and "<" with &, > and < in XML?

Will something "break" if I use numeric entities instead of the usual recommended alpha entities for reserved chars in XML?

This is part of a rather complex app that allows users to enter bibliographic metadata via XML, CSV or web-based forms. This data can then be extracted in XML (using the ONIX standard) with user-chosen encodings: utf-8, win-1252, etc.

The original programmers (long gone now...) decided to use numeric entities for all chars that cannot be represented in the chosen encoding. XML-reserved chars are considered as non-representable under any encoding. They are given the same treatment and are encoded using numeric entities.

Some users have complained about &, <, >, etc. being encoded as &#38, etc. instead of using the usual alpha codes and I'd like to know if these complaints have any substance.

If I can avoid digging through the legacy code to change this behaviour, it would save me a lot of resources.

Solution

Yes, it's fine to escape using numeric character references.

From the spec (emphasis mine):

The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&" and "<" respectively. The right angle bracket (>) may be represented using the string ">", and must, for compatibility, be escaped using either ">" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section.

You could also use a hex entity reference...

& = & = &

< = < = <

> = > = >

Is it correct to escape "&", ">" and "<" with &#38;, &#62; and &#60; in XML?

Is it correct to escape "&", ">" and "<" with &, > and < in XML?