Search code examples
delphimultipartform-dataindymime

Encoding problem while processing a multipart request on Indy HTTP server


I have a web server based on TIdHTTPServer. It is built in Delphi Sydney. From a webpage I'm receiving following multipart/form-data post stream:

-----------------------------16857441221270830881532229640 
Content-Disposition: form-data; name="d"

83AAAFUaVVs4Q07z
-----------------------------16857441221270830881532229640 
Content-Disposition: form-data; name="dir"

Upload
-----------------------------16857441221270830881532229640 
Content-Disposition: form-data; name="file_name"; filename="česká tečka.png"
Content-Type: image/png

PNG_DATA    
-----------------------------16857441221270830881532229640--

Problem is that text parts are not received correctly. I read the Indy MIME decoding of Multipart/Form-Data Requests returns trailing CR/LF and changed transfer encoding to 8bit which helps to receive file correctly, but received file name is still wrong (dir should be Upload and filename should be česká tečka.png).

d=83AAAFUaVVs4Q07z
dir=UploadW
??esk?? te??ka.png 75

To demonstrate the issue I simplified my code to a console app (please note that the MIME.txt file contains the same as is in post stream above):

program MIMEMultiPartTest;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.Classes, System.SysUtils,
  IdGlobal, IdCoder, IdMessage, IdMessageCoder, IdGlobalProtocols, IdCoderMIME, IdMessageCoderMIME,
  IdCoderQuotedPrintable, IdCoderBinHex4;


procedure ProcessAttachmentPart(var Decoder: TIdMessageDecoder; var MsgEnd: Boolean);
var
  MS: TMemoryStream;
  Name: string;
  Value: string;
  NewDecoder: TIdMessageDecoder;
begin
  MS := TMemoryStream.Create;
  try
    // http://stackoverflow.com/questions/27257577/indy-mime-decoding-of-multipart-form-data-requests-returns-trailing-cr-lf
    TIdMessageDecoderMIME(Decoder).Headers.Values['Content-Transfer-Encoding'] := '8bit';
    TIdMessageDecoderMIME(Decoder).BodyEncoded := False;
    NewDecoder := Decoder.ReadBody(MS, MsgEnd);
    MS.Position := 0; // nutne?
    if Decoder.Filename <> EmptyStr then // je to atachment
    begin
      try
        Writeln(Decoder.Filename + ' ' + IntToStr(MS.Size));
      except
        FreeAndNil(NewDecoder);
        Writeln('Error processing MIME');
      end;
    end
    else // je to parametr
    begin
      Name := ExtractHeaderSubItem(Decoder.Headers.Text, 'name', QuoteHTTP);
      if Name <> EmptyStr then
      begin
        Value := string(PAnsiChar(MS.Memory));
        try
          Writeln(Name + '=' + Value);
        except
          FreeAndNil(NewDecoder);
        Writeln('Error processing MIME');
        end;
      end;
    end;
    Decoder.Free;
    Decoder := NewDecoder;
  finally
    MS.Free;
  end;
end;

function ProcessMultiPart(const ContentType: string; Stream: TStream): Boolean;
var
  Boundary: string;
  BoundaryStart: string;
  BoundaryEnd: string;
  Decoder: TIdMessageDecoder;
  Line: string;
  BoundaryFound: Boolean;
  IsStartBoundary: Boolean;
  MsgEnd: Boolean;
begin
  Result := False;
  Boundary := ExtractHeaderSubItem('multipart/form-data; boundary=---------------------------16857441221270830881532229640', 'boundary', QuoteHTTP);
  if Boundary <> EmptyStr then
  begin
    BoundaryStart := '--' + Boundary;
    BoundaryEnd := BoundaryStart + '--';
    Decoder := TIdMessageDecoderMIME.Create(nil);
    try
      TIdMessageDecoderMIME(Decoder).MIMEBoundary := Boundary;
      Decoder.SourceStream := Stream;
      Decoder.FreeSourceStream := False;
      BoundaryFound := False;
      IsStartBoundary := False;
      repeat
        Line := ReadLnFromStream(Stream, -1, True);
        if Line = BoundaryStart then
        begin
          BoundaryFound := True;
          IsStartBoundary := True;
        end
        else
        begin
          if Line = BoundaryEnd then
            BoundaryFound := True;
        end;
      until BoundaryFound;
      if BoundaryFound and IsStartBoundary then
      begin
        MsgEnd := False;
        repeat
          TIdMessageDecoderMIME(Decoder).MIMEBoundary := Boundary;
          Decoder.SourceStream := Stream;
          Decoder.FreeSourceStream := False;
          Decoder.ReadHeader;
          case Decoder.PartType of
            mcptText,
            mcptAttachment:
              begin
                ProcessAttachmentPart(Decoder, MsgEnd);
              end;
            mcptIgnore:
              begin
                Decoder.Free;
                Decoder := TIdMessageDecoderMIME.Create(nil);
              end;
            mcptEOF:
              begin
                Decoder.Free;
                MsgEnd := True;
              end;
          end;
        until (Decoder = nil) or MsgEnd;
        Result := True;
      end
    finally
      Decoder.Free;
    end;
  end;
end;

var
  Stream: TMemoryStream;
begin
  Stream := TMemoryStream.Create;
  try
    Stream.LoadFromFile('MIME.txt');
    ProcessMultiPart('multipart/form-data; boundary=---------------------------16857441221270830881532229640', Stream);
  finally
    Stream.Free;
  end;
  Readln;
end.

Could someone help me what is wrong with my code? Thank you.


Solution

  • Your call to ExtractHeaderSubItem() in ProcessMultiPart() is wrong, it needs to pass in the ContentType string parameter, not a hard-coded string literal.

    Your call to ExtractHeaderSubItem() in ProcessAttachmentPart() is also wrong, it needs to pass in only the content of just the Content-Disposition header, not the entire Headers.Text. ExtractHeaderSubItem() is designed to only operate on 1 header at a time.

    Regarding the dir MIME part, the reason the body data ends up as 'UploadW' instead of 'Upload' is because you are not taking MS.Size into account when assigning MS.Memory to your Value string. The TMemoryStream data is NOT null-terminated! So, you will need to use SetString() instead of the := operator, eg:

    var
      Value: AnsiString;
    ...
    SetString(Value, PAnsiChar(MS.Memory), MS.Size);
    

    Regarding the Decoder.FileName, that value is not affected by the Content-Transfer-Encoding header at all. MIME headers simply do not allow unencoded Unicode characters. Currently, Indy's MIME decoder supports RFC2047-style encodings for Unicode characters in headers, per RFC 7578 Section 5.1.3, but your stream data is not using that format. It looks like your data is using raw UTF-8 octets 1 (which 5.1.3 also mentions as a possible encoding, but the decoder does not currently look for). So, you may have to manually extract and decode the original filename yourself as needed. If you know the filename will always be encoded as UTF-8, you could try setting Indy's global IdGlobal.GIdDefaultTextEncoding variable to encUTF8 (it defaults to encASCII), and then the Decoder.FileName should be accurate. But, that is a global setting, so may have unwanted side effects elsewhere in Indy, depending on context and data. So, I would suggest setting GIdDefaultTextEncoding to enc8Bit instead, so that unwanted side effects are minimized, and the Decoder.FileName will contain the original raw bytes as-is (just extended to 16-bit chars). That way, you can recover the original filename bytes by simply passing the Decoder.FileName as-is to IndyTextEncoding_8Bit.GetBytes(), and then decode them as needed (such as with IndyTextEncoding_UTF8.GetString(), after validating the bytes are valid UTF-8).

    1: However, ÄŤeská teÄŤka.png is not the correct UTF-8 form of česká tečka.png, it looks like that data may have been double-encoded, ie česká tečka.png was UTF-8 encoded, and then the resulting bytes were UTF-8 encoded again