Search code examples
flutterdartparsingunicodeutf-16

Encoding UFT16 Emoji returns invalid bytes


I am trying to encode a Unicode character in dart, but this results in an invalid byte array.

The character: 🔥

The bytes: [FF, FE, 3D, D8, 25, DD]

The string is encoded with BOM. After decoding this string I can see that the string is parsed correctly, resulting to see the emoji inside my IDE.

Then I try to encode the String again but that gives me a byte array, I don't understand:

[FF, FE, FD, FF, FD, FF]

I am using the package utf_convert to encode the string:

import 'package:utf_convert/utf_convert.dart' as utf;

List<int> convert(String input) {
  return utf.encodeUtf16le(input, true).cast<int>();
}

Is this a bug inside this package, or am I overseeing something here?

Edit1

I wrote some simple tests to capture the problem:

void main() {
  var emojiString = '🔥';
  var emojiBytes = <int>[0xFF, 0xFE, 0x3D, 0xD8, 0x25, 0xDD];

  test('Decode Emoji', () {
    var emoji = utf.decodeUtf16le(emojiBytes);

    expect(emoji, emojiString);
  });

  test('Encode Emoji', () {
    var bytes = utf.encodeUtf16le(emojiString, true).cast<int>();

    expect(bytes, emojiBytes);
  });
}

The function "Decode Emoji" succeeds, but the second one, "Encode Emoji" fails with the assertion:

Expected: [255, 254, 61, 216, 37, 221] Actual: [255, 254, 253, 255, 253, 255]


Solution

  • So after doing a lot of researching, I think this is a bug within this library. The code found there is a fork of a discontinued package found here.

    The solution I did now, was using some other piece of code, still existing inside the dart library. I found a hint inside this SO post.

    Then I implemented a new library on my own, which others facing the same issue can use too. I hosted it on GitHub and pub.dev under MIT license.