Search code examples
utf-8lazarusfreepascal

String to byte array in UTF-8?


How to convert a WideString (or other long string) to byte array in UTF-8?


Solution

  • A function like this will do what you need:

    function UTF8Bytes(const s: UTF8String): TBytes;
    begin
      Assert(StringElementSize(s)=1);
      SetLength(Result, Length(s));
      if Length(Result)>0 then
        Move(s[1], Result[0], Length(s));
    end;
    

    You can call it with any type of string and the RTL will convert from the encoding of the string that is passed to UTF-8. So don't be tricked into thinking you must convert to UTF-8 before calling, just pass in any string and let the RTL do the work.

    After that it's a fairly standard array copy. Note the assertion that explicitly calls out the assumption on string element size for a UTF-8 encoded string.

    If you want to get the zero-terminator you would write it so:

    function UTF8Bytes(const s: UTF8String): TBytes;
    begin
      Assert(StringElementSize(s)=1);
      SetLength(Result, Length(s)+1);
      if Length(Result)>0 then
        Move(s[1], Result[0], Length(s));
      Result[high(Result)] := 0;
    end;