Search code examples
unicodeencodingutf-8byte-order-mark

Why would I use a Unicode Signature Byte-Order-Mark (BOM)?


Are these obsolete? They seem like the worst idea ever -- embed something in the contents of your file that no one can see, but impacts the file's functionality. I don't understand why I would want one.


Solution

  • They're necessary in some cases, yes, because there are both little-endian and big-endian implementations of UTF-16.

    When reading an unknown UTF-16 file, how can you tell which of the two is used? The only solution is to place some kind of easily identifiable marker in the file, which can never be mistaken for anything else, regardless of the endian-ness used.

    That's what the BOM does.

    And do you need one? Only if you're 1) using an UTF encoding where endianness is an issue (It matters for UTF-16, but UTF8 always looks the same regardless of endianness), and the file is going to be shared with external applications.

    If your own app is the only one that's going to read and write the file, you can omit the BOM, and simply decide once and for all which endianness you're going to use. But if another application has to read the file, it won't know the endianness in advance, so adding the BOM might be a good idea.