Search code examples
httputf-8indylazarusfreepascal

Wrong result of TIdURI.URLDecode when unit LazUTF8 is used


With Free Pascal 3.0.4, this test program correctly writes ÄÖÜ

program FPCTest;

uses IdURI;

begin
  WriteLn(TIdURI.URLDecode('%C3%84%C3%96%C3%9C'));
  ReadLn;
end.

However if the unit LazUTF8 (as described here) is used, it writes ???

program FPCTest;

uses IdURI, LazUTF8;

begin
  WriteLn(TIdURI.URLDecode('%C3%84%C3%96%C3%9C'));
  ReadLn;
end.

How can I fix this decoding error for programs which use LazUTF8?


Solution

  • When the String type is an alias for AnsiString 1, much of Indy's functionality exposes extra parameters/properties to let users control which ANSI encodings are used when AnsiString values are passed around in operations that perform AnsiString<->byte conversions.

    1: Delphi pre-2009, and FreePascal/Lazarus when {$ModeSwitch UnicodeStrings} and {$Mode DelphiUnicode} are not used (FYI, Indy 11 will use them!).

    In most cases, Indy's default byte encoding is ASCII (because many of the Internet protocols that Indy implements originally supported only ASCII - individual Indy components upgrade themselves to UTF as appropriate per protocol), though some things use the OS default codepage/charset instead.

    Indy's default byte encoding can be changed at runtime by setting the global GIdDefaultTextEncoding variable in the IdGlobal unit, eg:

    GIdDefaultTextEncoding := encUTF8;
    

    But, in this particular situation, TIdURI.URLEncode() does not use GIdDefaultTextEncoding, but it does have an optional ADestEncoding parameter that you can use to specify a specific byte encoding for the returned AnsiString (in addition to an optional AByteEncoding parameter to specify the byte encoding of the parsed url octets - UTF-8 by default), eg:

    TIdURI.URLDecode('%C3%84%C3%96%C3%9C'
      {$IFNDEF FPC_UNICODESTRINGS}, IndyTextEncoding_UTF8, IndyTextEncoding_UTF8{$ENDIF}
    )
    

    The above will parse the url-encoded octets as UTF-8, and then return that data as-is in a UTF-8 encoded AnsiString.

    If you do not specify an output encoding for ADestEncoding, URLDecode() defaults to the OS default. If you want it to use GIdDefaultTextEncoding instead, specify IndyTextEncoding_Default in the ADestEncoding parameter:

    TIdURI.URLDecode('%C3%84%C3%96%C3%9C'
      {$IFNDEF FPC_UNICODESTRINGS}, IndyTextEncoding_UTF8, IndyTextEncoding_Default{$ENDIF}
    )
    

    Another option would be to use the IndyTextEncoding(CodePage) function for ADestEncoding, passing it FreePascal's DefaultSystemCodePage variable, which the LazUtils package sets to CP_UTF8 2:

    TIdURI.URLDecode('%C3%84%C3%96%C3%9C'
      {$IFNDEF FPC_UNICODESTRINGS}, IndyTextEncoding_UTF8, IndyTextEncoding(DefaultSystemCodePage){$ENDIF}
    )
    

    2: I have opened a ticket in Indy's issue tracker to add support for DefaultSystemCodePage when compiling for FreePascal/Lazarus.