Search code examples
unicodediffutf-16git-diffbyte-order-mark

Why does GNU Diff not understand UTF-16 (only UTF-8)?


Why doesn't GNU Diff understand UTF-16 (only UTF-8)?

This GNU Diff is used by default in Git.

Why doesn't this bug get fixed?

BOM is part of the Unicode standard. http://www.unicode.org/faq/utf_bom.html#bom4

Why is BOM ignored by most programmers?

In Windows, the encoding of UTF-16 is used by default for some source files.


Solution

  • https://lists.gnu.org/archive/html/bug-diffutils/2018-04/msg00009.html

    UTF-8 does not require BOM, but for UTF-16 and UTF-32 BOM is always present. Files with UTF-16 and UTF-32 without the BOM should be identified as binary.

    But why there are no plans to support UTF-16 and UTF-32? Diff is part of the Git and is used all over the world. Now 2018 and Unicode solved problems with encodings.

    https://lists.gnu.org/archive/html/bug-diffutils/2018-04/msg00011.html

    why there are no plans to support UTF-16 and UTF-32?

    Nobody has volunteered to do it, and there hasn't been a pressing need. UTF-16 and UTF-32 are primarily used for internal representation, not for text files. For more on the subject, please see:

    http://utf8everywhere.org/