I have a situation where I receive UTF-16 codepoints (one at a time). So I'm collecting them in a list and later convert the list to an array.
That leaves me with a uint16[]
, but GLib.convert ()
needs a string instead:
int main () {
var utf16data = new Gee.ArrayList<uint16> ();
utf16data.add ('A');
utf16data.add (0xD83C);
utf16data.add (0xDC1C);
var utf16array = utf16data.to_array ();
try {
// convert expects a string here
var s = convert (utf16array, utf16data.size * 2, "UTF-8", "UTF-16LE");
stdout.printf ("%s\n", s);
}
catch (ConvertError e) {
stderr.printf (@"error: $(e.message)\n");
}
return 0;
}
So how do I convert a UTF-16 array into a UTF-8 string?
Update:
I tried to just cast the array:
int main () {
var utf16data = new Gee.ArrayList<uint16> ();
utf16data.add ('A');
utf16data.add (0xD83C);
utf16data.add (0xDC1C);
// utf16data.add (0);
var utf16array = utf16data.to_array ();
try {
size_t bytes_read;
size_t bytes_written;
var s = convert ((string) utf16array, utf16data.size * 2, "UTF-8", "UTF-16LE", out bytes_read, out bytes_written);
stdout.puts (@"bytes_read = $bytes_read\n");
stdout.puts (@"bytes_written = $bytes_written\n");
stdout.puts (@"s.length = $(s.length)\n");
// Should print "A🀜", but the Unicode symbol is not printed
stdout.puts (@"s = $s\n");
}
catch (ConvertError e) {
stderr.printf (@"error: $(e.message)\n");
}
return 0;
}
Now at least the "A" is written to stdout, but the Unicode symbol is not.
bytes_read = 6
bytes_written = 3
s.length = 1
s = A
Is it correct to just cast an array to a string in this context?
Why is the Unicode symbol not converted?
Update 2:
This is the code that I have now settled with:
int main () {
var utf16data = new Gee.ArrayList<uint16> ();
utf16data.add ('A');
utf16data.add (0xD83C);
utf16data.add (0xDC1C);
// Replacement for
// utf16array = utf16data.to_array;
uint16[] utf16array = new uint16[utf16data.size];
for (int i = 0; i < utf16data.size; i++)
utf16array[i] = utf16data[i];
try {
var s = convert ((string)utf16array, utf16array.length * 2, "UTF-8", "UTF-16LE");
stdout.puts (@"$s\n");
}
catch (ConvertError e) {
stderr.puts (@"error: $(e.message)\n");
}
return 0;
}
The problem is with the to_array
. It does not produce an array of uint16
, but an array to pointers, with the value set to the uint16
value. This is the standard boxed representations. There seems to be a problem in Gee that it isn't producing an array of the correct type. If you change the array to:
uint16[] utf16array = {'A', 0xD83C, 0xDC1C};
It works just fine.