What is the difference between binary and utf8?

This node library checks whether a buffer is binary or utf8, but the contents of the test files (ansi.txt, utf8.txt) look exactly the same and I couldn't find any clear explanations of the differences.

What exactly is the difference between binary and utf8?

Solution

"Binary" is just a general term for data that is not human-readable text. It has nothing to do with encoding. Also, there are plenty of other ways to encode text than UTF-8, so binary and UTF-8 are not the only possible types of data.

The documentation says that isUtf8 checks whether it is encoded in UTF-8 or not. If it returns true, you know that the file is encoded in UTF-8. However if it returns false, you can not conclude that the file contains binary data, because it could also be encoded in UTF-16, ANSI, or other text encoding formats that are not considered binary.

According to the source code, the function reads the whole file and checks for UTF-8 encoded characters outside of the ASCII range. It looks like it will return true if there are only ASCII characters in the file, because the file would be the same in UTF-8 encoding.