Search code examples
delphizip

Is it possible to extract files from TZipFile using multiple threads?


I was wondering if it was possible to extract multiple files simultaneously from a TZipFile. I'm using Delphi 11.

I've had a bit of a play around with no luck. I was thinking something along the lines of, which doesn't work.

var
  z : TZipFile;
begin
  z := TZipFile.Create;
  z.Open('e:\temp\temp.zip', TZipMode.zmRead);
  TParallel.For(1, 0, z.FileCount-1,
    procedure(i : integer)
    begin
      z.Extract(i, 'e:\temp\Unzip');
    end
  );
  z.Free;
end;

Update: I made this short video on multi-threaded extraction https://youtu.be/wa7i1bmYgq4. Here is some demo code for testing purposes.

const
  ZipTo = 'E:\Temp\ZipTest';
  ZipFile = 'e:\temp\temp.zip';

procedure TForm32.Button3Click(Sender: TObject);
begin
  var z := TZipFile.Create;
  z.Open(ZipFile, TZipMode.zmRead);
  var FileCount := z.FileCount;
  z.Free;

  var sw := TStopwatch.StartNew;
  TParallel.For(1, 0, FileCount,
    procedure (i : integer)
    begin
      var z := TZipFile.Create;
      z.Open(ZipFile, TZipMode.zmRead);
      z.Extract(i, ZipTo);
      z.Free;
    end
  );
  ShowMessage(sw.ElapsedMilliseconds.ToString);
end;

procedure TForm32.Button4Click(Sender: TObject);
begin
  var sw := TStopwatch.StartNew;
  TZipFile.ExtractZipFile(ZipFile, ZipTo);
  ShowMessage(sw.ElapsedMilliseconds.ToString);
end;

Single-threaded: 3,783ms

Multithreaded: 1,591ms (approx 2.4x improvement)

You can clear the disk cache using CacheSet from SysInternals. It seems to make a difference of about 100ms on my machine. The zip file that I used was about 1GB in size and contained 6 video files. This means that I was using six threads (as it tries to extract every file at once), it looks like about four would be optimal for my machine, but it's going to depend greatly on disk speed and the type of content of the zip file.


Solution

  • I was wondering if it was possible to extract multiple files simultaneously from a TZipFile.

    No. TZipFile does not support parallel extraction of the compressed files.

    Most notable issue that prevents parallel extraction comes from FStream and FFileStream fields. Delphi streams don't support parallel access and even if they would support it, you wouldn't get any speed up in extracting operations, because any kind of operation in progress on stream would need to block other operations until the current one is complete.

    Streams are state-full instances and every operation on a stream changes internal stream state (current stream position). Even read operations change the state and that is what makes streams thread-unsafe and that unsafety impacts TZipFile class as well.

    Of course, there may be other sources of thread unsafety in the TZipFile class, but once you have one thing that is not safe and cannot be easily changed (fixed), there is no point in searching for other potential issues.