I have an array of UCS-2LE encoded bytes in Ruby and since this is my complete beginning with Ruby I'm struggling to convert it to UTF-8 string, I have the same code in PHP & Java working just fine.
In PHP I'm using iconv library, but in Ruby iconv has been deprecated:
$str = iconv('UCS-2LE', 'UTF-8//IGNORE', implode($byte_array));
In Java I'm using:
str = new String(byte_array, "UTF-16LE");
Bytes in the array are encoded as 2 bytes per 1 character, how to perform similar conversion in Ruby? I've tried a few solutions but it didn't work for me. Thank you.
Assuming a byte array:
byte_array = [70, 0, 111, 0, 111, 0]
You can use Array#pack
to convert the integer values to characters (C
treats each integer as an unsigned char):
string = byte_array.pack("C*") #=> "F\x00o\x00o\x00"
pack
returns a string with ASCII-8BIT encoding:
string.encoding #=> #<Encoding:ASCII-8BIT>
You can now use String#force_encoding
to reinterpret the bytes as an UTF-16 string:
string.force_encoding("UTF-16LE") #=> "Foo"
The bytes haven't changed so far:
string.bytes #=> [70, 0, 111, 0, 111, 0]
To transcode the string into another encoding, use String#encode
:
utf8_string = string.encode("UTF-8") #=> "Foo"
utf8_string.bytes #=> [70, 111, 111]
The whole conversion can be written in a single line:
byte_array.pack("C*").force_encoding("UTF-16LE").encode("UTF-8")
or by passing the source encoding as a 2nd argument to encode
:
byte_array.pack("C*").encode("UTF-8", "UTF-16LE")