Search code examples
powershellstream

Powershell unzip stream


Is there a built-in cmdlet or some composition thereof that would allow me to start unzipping a file stream as each chunk is downloaded? I have a PowerShell script that needs to download a large (10 GB) file, and I have to wait until it is done right now before it starts expanding...

$wc = New-Object net.webclient
$wc.Downloadfile($appDataSnapshotUri, "%DataSnapshotFileName%.zip") # this can take some time

Expand-Archive -Path "%DataSnapshotFileName%.zip" -DestinationPath Run # so can this

Solution

  • OK, turns out zip file doesn't need to be fully downloaded to be decompressed, you can compress/decompress streams. There is some built in capabilities in .Net for stream compression, but it will not work with zip archives. You can use SharpZipLib library for that:

    Download .nupckg from https://www.nuget.org/packages/SharpZipLib/ Extract files to any folder. You'll need ICSharpCode.SharpZipLib.dll from lib/net45

    Below is my simplified translation of their example: https://github.com/icsharpcode/SharpZipLib/wiki/Zip-Samples#unpack-a-zip-using-zipinputstream-eg-for-unseekable-input-streams

    Add-Type -Path ".\ICSharpCode.SharpZipLib.dll"
    
    $outFolder = ".\unzip"
    
    $wc = [System.Net.WebClient]::new()
    
    $zipStream = $wc.OpenRead("http://gitlab/test/test1/raw/master/sample.zip")
    
    $zipInputStream = [ICSharpCode.SharpZipLib.Zip.ZipInputStream]::New($zipStream)
    
    $zipEntry = $zipInputStream.GetNextEntry()
    
    $fileName = $zipEntry.Name
    
    $buffer = New-Object byte[] 4096
    
    $sw = [System.IO.File]::Create("$outFolder\$fileName")
    
    [ICSharpCode.SharpZipLib.Core.StreamUtils]::Copy($zipInputStream, $sw, $buffer)
    
    $sw.Close()
    

    It will only extract first entry, you can add a while loop it this sample works.

    Here is a snippet with while loop to extract multiple files (put it after $zipEntry = $zipInputStream.GetNextEntry() on the example above):

    While($zipEntry) {
    
    $fileName = $zipEntry.Name
    
    Write-Host $fileName
    
    $buffer = New-Object byte[] 4096
    
    $sw = [System.IO.File]::Create("$outFolder\$fileName")
    
    [ICSharpCode.SharpZipLib.Core.StreamUtils]::Copy($zipInputStream, $sw, $buffer)
    
    $sw.Close()
    
    $zipEntry = $zipInputStream.GetNextEntry()
    
    }
    

    Edit

    Here is what I found to work...

    Add-Type -Path ".\ICSharpCode.SharpZipLib.dll"
    
    $outFolder = "unzip"
    
    $wc = [System.Net.WebClient]::new()
    
    $zipStream = $wc.OpenRead("https://github.com/Esri/file-geodatabase-api/raw/master/FileGDB_API_1.5/FileGDB_API_1_5_VS2015.zip")
    
    $zipInputStream = [ICSharpCode.SharpZipLib.Zip.ZipInputStream]::New($zipStream)
    
    $zipEntry = $zipInputStream.GetNextEntry()
    
    while($zipEntry) {
    
    if (-Not($zipEntry.IsDirectory)) { 
      $fileName = $zipEntry.Name
    
      $buffer = New-Object byte[] 4096
    
      $filePath = "$pwd\$outFolder\$fileName"
      $parentPath = "$filePath\.."
      Write-Host $parentPath
    
      if (-Not (Test-Path $parentPath)) {
          New-Item -ItemType Directory $parentPath
      }
    
      $sw = [System.IO.File]::Create("$pwd\$outFolder\$fileName")
    
      [ICSharpCode.SharpZipLib.Core.StreamUtils]::Copy($zipInputStream, $sw, $buffer)
      $sw.Close()
    
    }
    
    $zipEntry = $zipInputStream.GetNextEntry()
    
    }