Search code examples
delphidelphi-xe8

How to allocate a UTF8 string on the stack/heap?


How can I allocate a UTF8 String on a stack/heap? Here is an example which uses a static array to allocate it. However the array is full of "?" in the debugger. Do I need to factor in codepage also while allocating?

program Project1;

procedure Main;
var
  Stack: Array[0..20] of AnsiChar;
  Heap: PAnsiChar;
begin
  Stack := '漢語漢語漢語漢語';

  GetMem(Heap, 8 * SizeOf(AnsiChar));
  Move(PAnsiChar('漢語漢語漢語漢語')^, Heap^, 8 * SizeOf(AnsiChar));
end;

begin
  Main;
end.

On the other hand this works fine.

program Project1;

procedure Main;
var
  S: UTF8String;
begin
  S := '漢語漢語漢語漢語';
end;

begin
  Main;
end.

Solution

  • You cannot persuade the compiler to produce a UTF-8 encoded constant. It will provide either ANSI or UTF-16, but not UTF-8. You'll have to handle the encoding yourself.

    That could look like this:

    procedure Main;
    const
      utf8string: PAnsiChar =
        #$E6#$BC#$A2#$E8#$AA#$9E#$E6#$BC#$A2#$E8#$AA#$9E +
        #$E6#$BC#$A2#$E8#$AA#$9E#$E6#$BC#$A2#$E8#$AA#$9E +
        #$00;
    var
      Stack: array [0..24] of AnsiChar;
    begin
      Move(Pointer(utf8string)^, Stack, SizeOf(Stack));
    end;
    

    Actually, it turns out I was wrong. You can persuade the compiler to UTF-8 encode constants. Like this:

    procedure Main;
    const
      utf8str: UTF8String = '漢語漢語漢語漢語';
    var
      Stack: array [0..24] of AnsiChar;
    begin
      Assert(Length(utf8str) + 1 = Length(Stack));
      Move(Pointer(utf8str)^, Stack, SizeOf(Stack));
    end;
    

    Note that your array was too short for the text, once it has been UTF-8 encoded.

    You already know how to allocate memory on the heap, so I don't need to explain that.