Search code examples
javascriptcharacter-encodingutftyped-arrays

Why String.protoptype.charCodeAt() can convert binary string into an Uint8Array?


Suppose I have a base64 encoded string and I want to convert it into an ArrayBuffer, I can do it in this way:

// base64 decode the string to get the binary data
const binaryString = window.atob(base64EncodedString);

// convert from a binary string to an ArrayBuffer
const buf = new ArrayBuffer(binaryString.length);
const bufView = new Uint8Array(buf);   
for (let i = 0, strLen = binaryString.length; i < strLen; i++) {
    bufView[i] = binaryString.charCodeAt(i);
}

// get ArrayBuffer: `buf`  

From String.protoptype.charCodeAt(), it will return an integer between 0 and 65535 representing the UTF-16 code unit at the given index. But an Uint8Array's range value is [0, 255].

I was initially thinking that the code point we obtained from charCodeAt() could go out of the bound of the Uint8Array range. Then I checked the built-in atob() function, which returns an ASCII string containing decoded data. According to Binary Array, ASCII string has a range from 0 to 127, which is included in the range of Uint8Array, and that's why we are safe to use charCodeAt() in this case.

That's my understanding. I'm not sure if I interpret this correctly. Thanks for your help!


Solution

  • So looks like my understanding is correct.

    Thanks to @Konrad, and here is his/her add-up:

    charCodeAt is designed to support utf-16. And utf-16 was designed to be compatible with ASCII so the first 256 characters have exact values like in ASCII encoding.