Search code examples
node.jstextencodingutf-8ascii

Buffer.from(base64EncodedString, 'base64').toString('binary') vs 'utf8'


In Node.js: Why does this test fail on the second call of main?

test('base64Encode and back', () => {
  function main(input: string) {
    const base64string = base64Encode(input);
    const text = base64Decode(base64string);
    expect(input).toEqual(text);
  }

  main('demo');
  main('😉😉😉');
});

Here are my functions:

export function base64Encode(text: string): string {
  const buffer = Buffer.from(text, 'binary');
  return buffer.toString('base64');
}

export function base64Decode(base64EncodedString: string): string {
  const buffer = Buffer.from(base64EncodedString, 'base64');
  return buffer.toString('binary');
}

From these pages, I figured I had written these functions correctly so that one would reverse the other:

If I change the 'binary' options to be 'utf8'instead, the test passes.

But my database currently has data where this function only seems to work if I use 'binary'.


Solution

  • binary is an alias for latin1

    'latin1': Latin-1 stands for ISO-8859-1. This character encoding only supports the Unicode characters from U+0000 to U+00FF. Each character is encoded using a single byte. Characters that do not fit into that range are truncated and will be mapped to characters in that range.

    This character set is unable to display multibyte utf8 characters.

    To get utf8 multibyte characters back, go directly to base64 and back again

    function base64Encode(str) {
      return Buffer.from(str).toString('base64')
    }
    function base64Decode(str) {
      return Buffer.from(str, 'base64').toString()
    }
    
    > base64Encode('😉')
    '8J+YiQ=='
    > base64Decode('8J+YiQ==')
    '😉'