Search code examples
htmlcharacter-encodinghtml-entities

HTML Entities: When to Use Decimal vs. Hex


Is there a good rule of thumb for when to use decimal vs. hexadecimal notation for HTML entities?

For example, a non-breaking hyphen is written in decimal as ‑ and in hex as ‑.

This answer says that hexadecimal is for Unicode; does that mean hex should be used if you're using the <meta charset="utf-8"> tag in the document <head>?

Occasionally, I will notice entity characters mistakenly rendered instead of the entities they represent -- for example, &amp; appearing (instead of an ampersand) in an email subject line or RSS headline. Is either hex or decimal better for avoiding this?

One last consideration: can using hex or decimal affect the rendering clarity (crispness) of the character?


Solution

  • The rule of thumb is: use whichever you prefer, but prefer hex. ☺

    There is no difference in meaning and no difference in browser support (the last browsers that supported decimal references only died in the 1990s).

    As @AlexW describes, hexadecimal references are more natural than decimal, due to the way character code standards are written. But if you find decimal references more convenient, use them.

    The issue has nothing to with meta tags and character encodings. The main reason why character references were introduced into HTML is that they let you enter characters quite independently of the encoding of the document. This includes characters that cannot be directly written at all in the encoding used. Thanks to them, you can enter any Unicode character even if the character encoding is ASCII or some other limited encoding, like ISO-8859-1.

    In the old days, it was common to recommend the use of named references (or “entity references” as they are formally called in classic HTML), when possible, because a reference like &Omega;, when displayed literally to the user, is more understandable than a reference like &#x3A9; or &#937;. This hasn’t been relevant for over a decade, as far as web browsers are considered. But e.g. e-mail clients might be kind of stupid^H^H^H^H^H^H^H^H^H underdeveloped in this respect. They might e.g. show references as such in a list of messages, even though they can intepret them properly when viewing a message. But there does not seem to be any consistent behavior that you could count on.