I have a web server based on TIdHTTPServer. It is built in Delphi Sydney. From a webpage I'm receiving following multipart/form-data post stream:
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="d"
83AAAFUaVVs4Q07z
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="dir"
Upload
-----------------------------16857441221270830881532229640
Content-Disposition: form-data; name="file_name"; filename="česká tečka.png"
Content-Type: image/png
PNG_DATA
-----------------------------16857441221270830881532229640--
Problem is that text parts are not received correctly. I read the Indy MIME decoding of Multipart/Form-Data Requests returns trailing CR/LF and changed transfer encoding to 8bit which helps to receive file correctly, but received file name is still wrong (dir should be Upload
and filename should be česká tečka.png
).
d=83AAAFUaVVs4Q07z
dir=UploadW
??esk?? te??ka.png 75
To demonstrate the issue I simplified my code to a console app (please note that the MIME.txt file contains the same as is in post stream above):
program MIMEMultiPartTest;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.Classes, System.SysUtils,
IdGlobal, IdCoder, IdMessage, IdMessageCoder, IdGlobalProtocols, IdCoderMIME, IdMessageCoderMIME,
IdCoderQuotedPrintable, IdCoderBinHex4;
procedure ProcessAttachmentPart(var Decoder: TIdMessageDecoder; var MsgEnd: Boolean);
var
MS: TMemoryStream;
Name: string;
Value: string;
NewDecoder: TIdMessageDecoder;
begin
MS := TMemoryStream.Create;
try
// http://stackoverflow.com/questions/27257577/indy-mime-decoding-of-multipart-form-data-requests-returns-trailing-cr-lf
TIdMessageDecoderMIME(Decoder).Headers.Values['Content-Transfer-Encoding'] := '8bit';
TIdMessageDecoderMIME(Decoder).BodyEncoded := False;
NewDecoder := Decoder.ReadBody(MS, MsgEnd);
MS.Position := 0; // nutne?
if Decoder.Filename <> EmptyStr then // je to atachment
begin
try
Writeln(Decoder.Filename + ' ' + IntToStr(MS.Size));
except
FreeAndNil(NewDecoder);
Writeln('Error processing MIME');
end;
end
else // je to parametr
begin
Name := ExtractHeaderSubItem(Decoder.Headers.Text, 'name', QuoteHTTP);
if Name <> EmptyStr then
begin
Value := string(PAnsiChar(MS.Memory));
try
Writeln(Name + '=' + Value);
except
FreeAndNil(NewDecoder);
Writeln('Error processing MIME');
end;
end;
end;
Decoder.Free;
Decoder := NewDecoder;
finally
MS.Free;
end;
end;
function ProcessMultiPart(const ContentType: string; Stream: TStream): Boolean;
var
Boundary: string;
BoundaryStart: string;
BoundaryEnd: string;
Decoder: TIdMessageDecoder;
Line: string;
BoundaryFound: Boolean;
IsStartBoundary: Boolean;
MsgEnd: Boolean;
begin
Result := False;
Boundary := ExtractHeaderSubItem('multipart/form-data; boundary=---------------------------16857441221270830881532229640', 'boundary', QuoteHTTP);
if Boundary <> EmptyStr then
begin
BoundaryStart := '--' + Boundary;
BoundaryEnd := BoundaryStart + '--';
Decoder := TIdMessageDecoderMIME.Create(nil);
try
TIdMessageDecoderMIME(Decoder).MIMEBoundary := Boundary;
Decoder.SourceStream := Stream;
Decoder.FreeSourceStream := False;
BoundaryFound := False;
IsStartBoundary := False;
repeat
Line := ReadLnFromStream(Stream, -1, True);
if Line = BoundaryStart then
begin
BoundaryFound := True;
IsStartBoundary := True;
end
else
begin
if Line = BoundaryEnd then
BoundaryFound := True;
end;
until BoundaryFound;
if BoundaryFound and IsStartBoundary then
begin
MsgEnd := False;
repeat
TIdMessageDecoderMIME(Decoder).MIMEBoundary := Boundary;
Decoder.SourceStream := Stream;
Decoder.FreeSourceStream := False;
Decoder.ReadHeader;
case Decoder.PartType of
mcptText,
mcptAttachment:
begin
ProcessAttachmentPart(Decoder, MsgEnd);
end;
mcptIgnore:
begin
Decoder.Free;
Decoder := TIdMessageDecoderMIME.Create(nil);
end;
mcptEOF:
begin
Decoder.Free;
MsgEnd := True;
end;
end;
until (Decoder = nil) or MsgEnd;
Result := True;
end
finally
Decoder.Free;
end;
end;
end;
var
Stream: TMemoryStream;
begin
Stream := TMemoryStream.Create;
try
Stream.LoadFromFile('MIME.txt');
ProcessMultiPart('multipart/form-data; boundary=---------------------------16857441221270830881532229640', Stream);
finally
Stream.Free;
end;
Readln;
end.
Could someone help me what is wrong with my code? Thank you.
Your call to ExtractHeaderSubItem()
in ProcessMultiPart()
is wrong, it needs to pass in the ContentType
string parameter, not a hard-coded string literal.
Your call to ExtractHeaderSubItem()
in ProcessAttachmentPart()
is also wrong, it needs to pass in only the content of just the Content-Disposition
header, not the entire Headers.Text
. ExtractHeaderSubItem()
is designed to only operate on 1 header at a time.
Regarding the dir
MIME part, the reason the body data ends up as 'UploadW'
instead of 'Upload'
is because you are not taking MS.Size
into account when assigning MS.Memory
to your Value
string. The TMemoryStream
data is NOT null-terminated! So, you will need to use SetString()
instead of the :=
operator, eg:
var
Value: AnsiString;
...
SetString(Value, PAnsiChar(MS.Memory), MS.Size);
Regarding the Decoder.FileName
, that value is not affected by the Content-Transfer-Encoding
header at all. MIME headers simply do not allow unencoded Unicode characters. Currently, Indy's MIME decoder supports RFC2047-style encodings for Unicode characters in headers, per RFC 7578 Section 5.1.3, but your stream data is not using that format. It looks like your data is using raw UTF-8 octets 1 (which 5.1.3 also mentions as a possible encoding, but the decoder does not currently look for). So, you may have to manually extract and decode the original filename
yourself as needed. If you know the filename
will always be encoded as UTF-8, you could try setting Indy's global IdGlobal.GIdDefaultTextEncoding
variable to encUTF8
(it defaults to encASCII
), and then the Decoder.FileName
should be accurate. But, that is a global setting, so may have unwanted side effects elsewhere in Indy, depending on context and data. So, I would suggest setting GIdDefaultTextEncoding
to enc8Bit
instead, so that unwanted side effects are minimized, and the Decoder.FileName
will contain the original raw bytes as-is (just extended to 16-bit chars). That way, you can recover the original filename
bytes by simply passing the Decoder.FileName
as-is to IndyTextEncoding_8Bit.GetBytes()
, and then decode them as needed (such as with IndyTextEncoding_UTF8.GetString()
, after validating the bytes are valid UTF-8).
1: However, česká tečka.png
is not the correct UTF-8 form of česká tečka.png
, it looks like that data may have been double-encoded, ie česká tečka.png
was UTF-8 encoded, and then the resulting bytes were UTF-8 encoded again