Search code examples
utf-8elixirutf-16utf-16le

Converting a UTF-16LE Elixir bitstring into an Elixir String


Given an Elixir bitstring encoded in UTF-16LE:

<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0, 0, 0>>

how can I get this converted into a readable Elixir String (it spells out "Devastator")? The closest I've gotten is transforming the above into a list of the Unicode codepoints (["0044", "0065", ...]) and trying to prepend the \u escape sequence to them, but Elixir throws an error since it's an invalid sequence. I'm out of ideas.


Solution

  • The simplest way is using functions from the :unicode module:

    :unicode.characters_to_binary(utf16binary, {:utf16, :little})
    

    For example

    <<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0, 0, 0>>
    |> :unicode.characters_to_binary({:utf16, :little})
    |> IO.puts
    #=> Devastator
    

    (there's a null byte at the very end, so the binary display instead of string will be used in the shell, and depending on OS it may print some extra representation for the null byte)