Search code examples
fileencodingutf-8utf-16

Detect UTF-16 file content


Is it possible to know if a file has Unicode (16-byte per char) or 8-bit ASCII content?


Solution

  • Ditto to what Brian Agnew said about reading the byte order mark, a special two bytes that might appear at the beginning of the file.

    You can also know if it is ASCII by scanning every byte in the file and seeing if they are all less than 128. If they are all less than 128, then it's just an ASCII file. If some of them are more than 128, there is some other encoding in there.