Suppose we have a string with some (astral) Unicode characters:
const s = 'Hi π Unicode!'
The []
operator and .charAt()
method don't work for getting the 4th character, which should be "π":
> s[3]
'οΏ½'
> s.charAt(3)
'οΏ½'
The .codePointAt()
does get the correct value for the 4th character, but unfortunately it's a number and has to be converted back to a string using String.fromCodePoint()
:
> String.fromCodePoint(s.codePointAt(3))
'π'
Similarly, converting the string into an array using splats yields valid Unicode characters, so that's another way of getting the 4th one:
> [...s][3]
'π'
But i can't believe that going from string to number back to string, or having to split the string into an array are the only ways of doing this seemingly trivial thing. Isn't there a simple method for doing this?
> s.simpleMethod(3)
'π'
Note: i know that the definition of "character" is somewhat fuzzy, but for the purpose of this question a character is simply the symbol that corresponds to a Unicode codepoint (no combining characters, no grapheme clusters, etc).
Update: the String.fromCodePoint(str.codePointAt(n))
method is not really viable, since the n
th position there doesn't take previous astral symbols into account: String.fromCodePoint('ππ'.codePointAt(1)) // => 'οΏ½'
(I feel kinda dumb asking this; like i'm probably missing something obvious. But previous answers to this questions don't work on strings with Unicode simbols on astral planes.)
The string iterator is the only thing that iterates through code points rather than UCS-2/UTF-16 code units. So:
const string = 'Hi π Unicode!';
for (const symbol of string) {
console.log(symbol);
}
So to get a specific code point based on its index from a string:
const string = 'Hi π Unicode!';
// Note: The spread operator uses the string iterator under the hood.
const symbols = [...string];
symbols[3]; // 'π'
Still, this would break with grapheme clusters, or emoji sequences such as π¨βπ©βπ§βπ¦
(π¨ + U+200D ZERO WIDTH JOINER + π© + U+200D ZERO WIDTH JOINER + π§ + U+200D ZERO WIDTH JOINER + π¦). Text segmentation helps with that.
Do you actually need to get the 4th code point in the string, though? Whatβs your use case?