Search code examples
delphi-7indy10idhttp

error with ISO-88559-1 encoding using TIdHttp and Delphi


I have a big problem regarding the accentuation in the result obtained from the Post() method of TIdHTTP.

The URL I'm accessing is already coded correctly, I saved the result to a text file on the server just to make sure it's all correct. But when I bring the data to Delphi through a function that I created, instead of letters with accents, the character "?" Is appearing.

For example, if the page results in Conexão não configurada, the result of the function is Conex?o n?o configurada.

I've tried several forms posted here in StackOverflow, but I did not succeed.

My function is as follows:

function HttpPost(PostUrl: string; PostParams: TStringList): string;
var
  IdHTTP1: TIdHTTP;
  IOHandler: TIdSSLIOHandlerSocketOpenSSL;
begin
  IdHTTP1 := TIdHTTP.Create(nil);
  IOHandler := TIdSSLIOHandlerSocketOpenSSL.Create(nil);
  IdHTTP1.IOHandler := IOHandler;
  IdHTTP1.HandleRedirects := True;
  IdHTTP1.Request.ContentType := 'text/html';
  IdHTTP1.Request.CharSet := 'ISO-8859-1';
  IdHTTP1.Request.UserAgent := 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0';
  IdHTTP1.ReadTimeout := 20000;
  try
    Result := IdHTTP1.Post(PostUrl, PostParams);
  except
    on E:Exception do
    begin
      Result := 'ErrorExcept';
      Msg(E,2);
    end;
  end;
  IdHTTP1.Free;
  IOHandler.Free;
end;

The updated version of Indy at 10.6.2.0


Solution

  • You are using an ANSI version of Delphi (Delphi switched to Unicode in 2009).

    The version of TIdHTTP.Post() that returns a String decodes the raw server data to Unicode using the charset reported in the Content-Type response header, or a default if no charset is specified. So, make sure the data being sent is actually encoded in the correct charset, and that charset is being reported correctly.

    In Unicode versions of Delphi, where String is an alias for UnicodeString, this Unicode data is returned as-is.

    In ANSI versions of Delphi, where String is an alias for AnsiString, Post() converts this Unicode data to ANSI for output. The ? characters you are seeing mean the Unicode data has characters that do not exist in the ANSI charset being converted to. Post() has an optional ADestEncoding parameter to specify the desired ANSI charset for output. If not specified, Indy's default encoding is used. That default is controlled by the global GIdDefaultTextEncoding variable in the IdGlobal unit, which is set to encASCII (7bit US-ASCII) by default.

    The output ANSI charset does not need to be the same as the charset used by the raw data. The point of ADestEncoding is to specify the charset that you want the output to be in.

    If you know ahead of time the exact ANSI charset you want to use, you can set ADestEncoding to an IIdTextEncoding for that charset, such as from the CharsetToEncoding() function in the IdGlobalProtocols unit, or the IndyTextEncoding() function in the IdGlobal unit.

    Or, to use the OS default charset of the machine your code is running on, set ADestEncoding to IndyTextEncoding_OSDefault (or set GIdDefaultTextEncoding to encOSDefault).

    But note that Unicode-to-ANSI conversions are usually lossy, so it is better to use UTF-8 instead, which is lossless. You can set ADestEncoding to IndyTextEncoding_UTF8 (or set GIdDefaultTextEncoding to encUTF8).