Search code examples
unicodeutf-8character-encodingbyte-order-mark

What's the difference between UTF-8 and UTF-8 with BOM?


What's different between UTF-8 and UTF-8 with BOM?


Solution

  • The UTF-8 BOM is a sequence of bytes at the start of a text stream (0xEF, 0xBB, 0xBF) that allows the reader to more reliably guess a file as being encoded in UTF-8.

    Normally, the BOM is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary.

    According to the Unicode standard, the BOM for UTF-8 files is not recommended:

    2.6 Encoding Schemes

    ... Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature. See the “Byte Order Mark” subsection in Section 16.8, Specials, for more information.