Search code examples
delphidelphi-xe2indyindy10

How to receive query string containing foreign characters in TIdHTTPServer


I'm using TIdHTTPServer in Delphi XE2 to act as a basic HTML server to get requests from the web, process them and give back a needed response.

The problem is when someone opens a page like localhost:5678/book?name=Петров, I cannot receive the name "Петров" correctly.

The procedure is simple at this point:

procedure TMain.IdHTTPServer1CommandGet(AContext: TIdContext;
  ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
var
  Aux_S1          : String;
  Aux_S2          : String;
begin

  Aux_S1 := ARequestInfo.Params[0];

  Aux_S2 := System.UTF8Decode(ARequestInfo.Params[0]);

end;

Aux_S1 is 'name=Ð'#$009F'еÑ'#$0082'Ñ'#$0080'ов'

Aux_S2 is 'name=�?е�?�?ов'

Some letters are shown correctly but others are not.

What am I doing wrong, or how should I process these requests?


Solution

  • A URL is not allowed to include non-ASCII characters. Such characters must be charset-encoded into bytes which are then encoded in %HH format when put into the URL. So, what your client is actually using as the URL is something more like this:

    http://localhost:5678/book?name=%D0%9F%D0%B5%D1%82%D1%80%D0%BE%D0%B2
    

    %D0%9F%D0%B5%D1%82%D1%80%D0%BE%D0%B2 is Петров in UTF-8 percent-encoded format.

    A URL has no way of specifying the charset used for such encoding. It is up to the server to decide. UTF-8 is the most common charset encoding used, though.

    TIdHTTPServer automatically parses and decodes the URL query string before triggering the OnCommandGet event, if the ParseParams property is true (which it is by default). So don't call UTF8Decode() directly on the parameter strings, as it will not work.

    Unfortunately, TIdHTTPServer does not currently allow you to specify which charset to use for decoding the query string (that is on the TODO list). What it does is checks if the request includes a charset attribute in the Content-Type header, and if so then uses it (this is not standard HTTP server behavior, though), otherwise it uses Indy's built-in 8bit encoding instead.

    The latter case is what usually happens in GET requests, as they do not carry a Content-Type header. This will work to your advantage, though (see further below). The string value:

    'Ð'#$009F'еÑ'#$0082'Ñ'#$0080'ов'
    

    Is actually the raw UTF-8 bytes of Петров being interpreted as 8bit "characters" when decoded to a UnicodeString:

    #$00D0 #$009F #$00D0 #$00B5 #$00D1 #$0082 #$00D1 #$0080 #$00D0 #$00BE #$00D0 #$00B2 
    

    So, you can "fix" this decoding mismatch by manually converting the decoded parameter string back into raw bytes and then decode them as UTF-8 back into a string, eg:

    procedure TMain.IdHTTPServer1CommandGet(AContext: TIdContext;
      ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
    var
      Aux_S1: String;
    begin
      // if you are not using Indy 10.6+, you can replace
      // IndyTextEncoding_UTF8 with TIdTextEncoding.UTF8,
      // and IndyTextEncoding_8bit with Indy8BitEncoding...
      //
      //Aux_S1 := TIdTextEncoding.UTF8.GetString(ToBytes(ARequestInfo.Params[0], Indy8BitEncoding));
      Aux_S1 := IndyTextEncoding_UTF8.GetString(ToBytes(ARequestInfo.Params[0], IndyTextEncoding_8bit));
    
    end;
    

    Alternatively, set ParseParams to false and manually decode the ARequestInfo.QueryParams string (the original percent-encoded data from the URL) instead:

    procedure DecodeParams(const AValue: String; Params: TStrings);
    var
      i, j : Integer;
      s: string;
    
      // if you are not using Indy 10.6+, you can replace
      // IIdTextEncoding with TIdTextEncoding...
      //
      //LEncoding: TIdTextEncoding;
      LEncoding: IIdTextEncoding;
    begin
      // Convert special characters
      // ampersand '&' separates values    {Do not Localize}
      Params.BeginUpdate;
      try
        Params.Clear;
    
        // if you are not using Indy 10.6+, you can replace
        // IndyTextEncoding_UTF8 with TIdTextEncoding.UTF8...
        //
        //LEncoding := TIdTextEncoding.UTF8;
        LEncoding := IndyTextEncoding_UTF8;
    
        i := 1;
        while i <= Length(AValue) do
        begin
          j := i;
          while (j <= Length(AValue)) and (AValue[j] <> '&') do {do not localize}
          begin
            Inc(j);
          end;
          s := Copy(AValue, i, j-i);
          // See RFC 1866 section 8.2.1. TP
          s := ReplaceAll(s, '+', ' ');  {do not localize}
          Params.Add(TIdURI.URLDecode(s, LEncoding));
          i := j + 1;
        end;
      finally
        Params.EndUpdate;
      end;
    end;
    
    procedure TMain.IdHTTPServer1CommandGet(AContext: TIdContext;
      ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
    var
      Aux_S1: String;
    begin
      DecodeParams(LRequestInfo.QueryParams, ARequestInfo.Params);
      Aux_S1 := ARequestInfo.Params[0];    
    end;