Is there an erlang equivalent of codePointAt from js? One that gets the code point starting at a byte offset, without modifying the underlying string/binary?
You can use bit syntax pattern matching to skip the first N bytes and decode the first character from the remaining bytes as UTF-8:
1> CodePointAt = fun(Binary, Offset) ->
<<_:Offset/binary, Char/utf8, _/binary>> = Binary,
Char
end.
Test:
2> CodePointAt(<<"πr²"/utf8>>, 0).
960
3> CodePointAt(<<"πr²"/utf8>>, 1).
** exception error: no match of right hand side value <<207,128,114,194,178>>
4> CodePointAt(<<"πr²"/utf8>>, 2).
114
5> CodePointAt(<<"πr²"/utf8>>, 3).
178
6> CodePointAt(<<"πr²"/utf8>>, 4).
** exception error: no match of right hand side value <<207,128,114,194,178>>
7> CodePointAt(<<"πr²"/utf8>>, 5).
** exception error: no match of right hand side value <<207,128,114,194,178>>
As you can see, if the offset is not in a valid UTF-8 character boundary, the function will throw an error. You can handle that differently using a case
expression if needed.