Search code examples
c#zipcompression.net-4.5

System.IO.Compression.ZipArchive memory management


in .Net 4.5 the System.IO.Compression.ZipArchive class get some updates.

As readable here (http://msdn.microsoft.com/en-us/magazine/jj133817.aspx) it should now do "typical operations don’t require reading the entire archive into memory".

For testing I try to compress 10 files, each 200MB size.

This works good if you create new zip archives with this code (low memory usage over complete process):

for (int directoryGroupIndex = 0; directoryGroupIndex < directoryGroups.Count; directoryGroupIndex++)
{
  String directoryGroupKey = directoryGroups.Keys.ElementAt(directoryGroupIndex);
  FileInfo[] directoryGroup = directoryGroups[directoryGroupKey];

  String archiveFileName = String.Format("Readed Logfiles{0}", archiveFileExtension);
  String archiveFileFullName = Path.Combine(directoryGroupKey, archiveFileName);
  FileInfo archiveFile = new FileInfo(archiveFileFullName);


  using (FileStream archiveFileStream = new FileStream(archiveFile.FullName, FileMode.OpenOrCreate, FileAccess.Write, FileShare.Read))
  using (ZipArchive archive = new ZipArchive(archiveFileStream, ZipArchiveMode.Create, false))
  {
    for (int directoryGroupFileIndex = 0; directoryGroupFileIndex < directoryGroup.Length; directoryGroupFileIndex++)
    {
      FileInfo file = directoryGroup[directoryGroupFileIndex];
      String archiveEntryName = file.Name;
      String archiveEntryPath = DateTime.Now.ToString("yyyy-MM-dd");
      String archiveEntryFullName = Path.Combine(archiveEntryPath, archiveEntryName);

      ZipArchiveEntry archiveEntry = archive.CreateEntryFromFile(file.FullName, archiveEntryFullName, CompressionLevel.Optimal);
    }
  }              
}

Now I want to add new entries to this archive. I leave my code as it is and run it again. (with new files inside root directory) If I look into the documentaion I read "Only creating new archive entries is permitted" that all I want. So my code should be fine.

Result is now that:

  1. the file table inside the archive is overwritten (only the new files are listed).

  2. The archive file size has grown (like the old ones are still in there).

  3. The archive is corrupted. You can open it but you can't decopmress the Content.

If I change the ZipArchiveMode to "ZipArchiveMode.Update" it works like expected, but only with small files. Files like my, throw a out-of-Memory exception, because the complete archive is loaded to memory.

My question now is: Am I doing it wrong, is this a bug or is it a design flaw?


Solution

  • The code you've written is causing the ZipArchive class to write a whole new archive at the end of your previous one, which of course corrupts the file.

    The way to do what you want is to copy the original archive to a new file as you create it, and then replace the original with the new one. For example:

    string tempFile = Path.GetTempFileName();
    
    using (ZipArchive original =
        new ZipArchive(File.Open(archiveFileStream, FileMode.Open), ZipArchiveMode.Read))
    using (ZipArchive newArchive =
        new ZipArchive(File.Open(tempFile, FileMode.Create), ZipArchiveMode.Create))
    {
        foreach (ZipArchiveEntry entry in original.Entries)
        {
            ZipArchiveEntry newEntry = newArchive.Create(entry.FullName);
    
            using (Stream source = entry.Open())
            using (Stream destination = newEntry.Open())
            {
                source.CopyTo(destination);
            }
        }
    
        for (int directoryGroupFileIndex = 0;
                directoryGroupFileIndex < directoryGroup.Length;
                directoryGroupFileIndex++)
        {
            FileInfo file = directoryGroup[directoryGroupFileIndex];
            String archiveEntryName = file.Name;
            String archiveEntryPath = DateTime.Now.ToString("yyyy-MM-dd");
            String archiveEntryFullName = Path.Combine(archiveEntryPath, archiveEntryName);
    
            ZipArchiveEntry archiveEntry = newArchive.CreateEntryFromFile(
                file.FullName, archiveEntryFullName, CompressionLevel.Optimal);
        }
    }
    
    File.Delete(archiveFileStream);
    File.Move(tempFile, archiveFileStream);
    

    Note that this isn't actually going to be slower than ZipArchiveMode.Update. When you use the update mode, the ZipArchive class reads the entire archive into memory (as you noted), and then when you close it, it recompresses and writes everything back out.

    The above does basically the exact same computations, but simply uses the disk as the intermediate storage instead of memory.