Search code examples
node.jsbufferbinaryfilesendianness

Reading 2 byte utf8 binary data


I have binary files that contain utf8 strings for example.

4F 00 4B 00

I'm trying to read this data and write it out to a text file but when I do the following:

data.toString('utf8');

I get an output of:

O K

Take note of the two spaces being interpreted from the 00. Is there any way to specify I'm using 2 byte little endian characters? I imagine if this didn't contain ascii characters this would actually break and produce garbage data instead of extra spaces.


Solution

  • The problem likely is when you're reading, not writing the string. The string you shared is not UTF-8, it's UTF-16. So what I'm thinking you want is to read the string as UTF-16 and write it as UTF-8.

    Specifically this is UTF-16LE.