I'm using TIdHTTPServer
in Delphi XE2 to act as a basic HTML server to get requests from the web, process them and give back a needed response.
The problem is when someone opens a page like localhost:5678/book?name=Петров
, I cannot receive the name "Петров" correctly.
The procedure is simple at this point:
procedure TMain.IdHTTPServer1CommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
var
Aux_S1 : String;
Aux_S2 : String;
begin
Aux_S1 := ARequestInfo.Params[0];
Aux_S2 := System.UTF8Decode(ARequestInfo.Params[0]);
end;
Aux_S1
is 'name=Ð'#$009F'еÑ'#$0082'Ñ'#$0080'ов'
Aux_S2
is 'name=�?е�?�?ов'
Some letters are shown correctly but others are not.
What am I doing wrong, or how should I process these requests?
A URL is not allowed to include non-ASCII characters. Such characters must be charset-encoded into bytes which are then encoded in %HH
format when put into the URL. So, what your client is actually using as the URL is something more like this:
http://localhost:5678/book?name=%D0%9F%D0%B5%D1%82%D1%80%D0%BE%D0%B2
%D0%9F%D0%B5%D1%82%D1%80%D0%BE%D0%B2
is Петров
in UTF-8 percent-encoded format.
A URL has no way of specifying the charset used for such encoding. It is up to the server to decide. UTF-8 is the most common charset encoding used, though.
TIdHTTPServer
automatically parses and decodes the URL query string before triggering the OnCommandGet
event, if the ParseParams
property is true (which it is by default). So don't call UTF8Decode()
directly on the parameter strings, as it will not work.
Unfortunately, TIdHTTPServer
does not currently allow you to specify which charset to use for decoding the query string (that is on the TODO list). What it does is checks if the request includes a charset
attribute in the Content-Type
header, and if so then uses it (this is not standard HTTP server behavior, though), otherwise it uses Indy's built-in 8bit encoding instead.
The latter case is what usually happens in GET
requests, as they do not carry a Content-Type
header. This will work to your advantage, though (see further below). The string value:
'Ð'#$009F'еÑ'#$0082'Ñ'#$0080'ов'
Is actually the raw UTF-8 bytes of Петров
being interpreted as 8bit "characters" when decoded to a UnicodeString
:
#$00D0 #$009F #$00D0 #$00B5 #$00D1 #$0082 #$00D1 #$0080 #$00D0 #$00BE #$00D0 #$00B2
So, you can "fix" this decoding mismatch by manually converting the decoded parameter string back into raw bytes and then decode them as UTF-8 back into a string, eg:
procedure TMain.IdHTTPServer1CommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
var
Aux_S1: String;
begin
// if you are not using Indy 10.6+, you can replace
// IndyTextEncoding_UTF8 with TIdTextEncoding.UTF8,
// and IndyTextEncoding_8bit with Indy8BitEncoding...
//
//Aux_S1 := TIdTextEncoding.UTF8.GetString(ToBytes(ARequestInfo.Params[0], Indy8BitEncoding));
Aux_S1 := IndyTextEncoding_UTF8.GetString(ToBytes(ARequestInfo.Params[0], IndyTextEncoding_8bit));
end;
Alternatively, set ParseParams
to false and manually decode the ARequestInfo.QueryParams
string (the original percent-encoded data from the URL) instead:
procedure DecodeParams(const AValue: String; Params: TStrings);
var
i, j : Integer;
s: string;
// if you are not using Indy 10.6+, you can replace
// IIdTextEncoding with TIdTextEncoding...
//
//LEncoding: TIdTextEncoding;
LEncoding: IIdTextEncoding;
begin
// Convert special characters
// ampersand '&' separates values {Do not Localize}
Params.BeginUpdate;
try
Params.Clear;
// if you are not using Indy 10.6+, you can replace
// IndyTextEncoding_UTF8 with TIdTextEncoding.UTF8...
//
//LEncoding := TIdTextEncoding.UTF8;
LEncoding := IndyTextEncoding_UTF8;
i := 1;
while i <= Length(AValue) do
begin
j := i;
while (j <= Length(AValue)) and (AValue[j] <> '&') do {do not localize}
begin
Inc(j);
end;
s := Copy(AValue, i, j-i);
// See RFC 1866 section 8.2.1. TP
s := ReplaceAll(s, '+', ' '); {do not localize}
Params.Add(TIdURI.URLDecode(s, LEncoding));
i := j + 1;
end;
finally
Params.EndUpdate;
end;
end;
procedure TMain.IdHTTPServer1CommandGet(AContext: TIdContext;
ARequestInfo: TIdHTTPRequestInfo; AResponseInfo: TIdHTTPResponseInfo);
var
Aux_S1: String;
begin
DecodeParams(LRequestInfo.QueryParams, ARequestInfo.Params);
Aux_S1 := ARequestInfo.Params[0];
end;