Search code examples
delphidelphi-2007

Delphi UTF8ToAnsi failure


When I use UTF8ToAnsi on this string, the result is empty. Any idea why that might be?

msgid "2. Broughton, PMG. ^iJournal of Automatic Chemistry.^n ^lVol 6. No 2. (April – June 1984) pp 94-95."

This demonstrates the problem:

procedure TForm1.FormShow(Sender: TObject);
begin
Memo1.Lines.Text :=
  '<<' +
  UTF8ToANSI('msgid "2. Broughton, PMG. ^iJournal of Automatic Chemistry.^n^lVol 6. No 2. (April – June 1984) pp 94-95."') +
  '>>';
end;

which produces

"<<>>"


Solution

  • Your code fails because what you pass is not UTF-8 encoded. What you pass this function is actually ANSI encoded. When Utf8Decode receives that text, it attempts to decode it and when it encounters the malformed bytes, bytes that are not UTF-8, it bails out and returns the empty string.

    The problem character is the dash in April – June 1984 which is an n-dash. In ANSI that is encoded as #150. When you attempt to interpret that as UTF-8, that #150 is not a single byte encoding of a character, and is also invalid as the first byte of a multi-byte sequence. Hence the failure.

    To solve your actual problem, you'll need to work out why you have data that is not UTF-8 in a place where you expect UTF-8.