Search code examples
c#parallel-processingzipdotnetzip

Is it possible to download and unzip in parallel?


I have some large zip files that I'm downloading and then unzipping in my program. Performance is important, and one direction I started thinking about was whether it was possible to start the download and then begin unzipping the data as it arrives, instead of waiting for the download to complete and then start unzipping. Is this possible? From what I understand of DEFLATE, it should be theoretically possible right?

I'm currently using DotNetZip as my zip library, but it refuses to act on a non-seekable stream.

Code would be something like this:

// HTTP Get the application from the server
var request = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(url);
request.Method = "GET";

Directory.CreateDirectory(localPath);
using (var response = (HttpWebResponse)request.GetResponse())
using (Stream input = response.GetResponseStream())
{
    // Unzip being some function which will start unzipping and
    // return when unzipping is done
    return Unzip(input, localPath);
}

Solution

  • I started thinking about was whether it was possible to start the download and then begin unzipping the data as it arrives, instead of waiting for the download to complete and then start unzipping. Is this possible?

    If you want to start unzipping whilst the response body is still downloading, you can't really do this.

    In a ZIP file, the Central Directory Record, which contains the list of files in the ZIP file, is located at the very end of the ZIP file. It will be the last thing you download. Without it, you can't reliably determine where the individual file records are located in your ZIP file.

    This would also explain why DotNetZip needs a seekable stream. It needs to be able to read the Central Directory Record at the end of the file first, then jump back to earlier sections to read information about individual ZIP entries to extract them.

    If you have very specific ZIP files you could make certain assumptions about the layout of those individual file records and extract them by hand, without seeking backwards, but it would not be broadly compatible with ZIP files in general.