Search code examples
erlang

Erlang equivalent of javascript codePointAt?


Is there an erlang equivalent of codePointAt from js? One that gets the code point starting at a byte offset, without modifying the underlying string/binary?


Solution

  • You can use bit syntax pattern matching to skip the first N bytes and decode the first character from the remaining bytes as UTF-8:

    1> CodePointAt = fun(Binary, Offset) ->
      <<_:Offset/binary, Char/utf8, _/binary>> = Binary,
      Char
    end.
    

    Test:

    2> CodePointAt(<<"πr²"/utf8>>, 0).
    960
    3> CodePointAt(<<"πr²"/utf8>>, 1).
    ** exception error: no match of right hand side value <<207,128,114,194,178>>
    4> CodePointAt(<<"πr²"/utf8>>, 2).
    114
    5> CodePointAt(<<"πr²"/utf8>>, 3).
    178
    6> CodePointAt(<<"πr²"/utf8>>, 4).
    ** exception error: no match of right hand side value <<207,128,114,194,178>>
    7> CodePointAt(<<"πr²"/utf8>>, 5).
    ** exception error: no match of right hand side value <<207,128,114,194,178>>
    

    As you can see, if the offset is not in a valid UTF-8 character boundary, the function will throw an error. You can handle that differently using a case expression if needed.