Search code examples
windowspowershellarchivecompress-archive

Archive all files of a certain type recursively from powershell


Is there a way to use Compress-Archive script, that when run from a path:

  1. archives all files matching a wildcard filter (*.doc, for example)
  2. archives such files in the current folder and all children folders
  3. save the relative folder structure (the option to use relative or absolute would be good, though)

I am having trouble have it accomplish all three of these at once.

Edit:

The following filters and recurses, but does not maintain folder structure

Get-ChildItem -Path ".\" -Filter "*.docx" -Recurse |
Compress-Archive -CompressionLevel Optimal -DestinationPath "$pwd\doc.archive-$(Get-Date -f yyyyMMdd.hhmmss).zip"

This item does not recurse:

Compress-Archive -Path "$pwd\*.docx" -CompressionLevel Optimal -DestinationPath "$pwd\doc.archive-$(Get-Date -f yyyyMMdd.hhmmss).zip"

At some point I had a command that would recurse but not filter, but can't get back to it now.


Solution

  • Unfortunately, Compress-Archive is quite limited as of Windows PowerShell v5.1 / PowerShell Core 6.1.0:

    • The only way to preserve a subdirectory tree is pass a directory path to Compress-Archive.

      • Unfortunately, doing so provides no inclusion/exclusion mechanism to only select a subset of files.

      • Additionally, the resulting archive will internally contain a single root directory named for the input directory (e.g., if you pass C:\temp\foo to Compress-Archive, the resulting archive will contain a single foo directory containing the input directory's subtree - as opposed to containing C:\temp\foo's content at the top level).

      • There is no option to preserve absolute paths.

    • A cumbersome work around is to create a temporary copy of your directory tree with only the files of interest (Copy-Item -Recurse -Filter *.docx . $env:TEMP\tmpDir; Compress-Archive $env:TEMP\tmpDir out.zip - note that empty dirs. will be included)

      • Given that you'll still invariably end up with a single root directory named for the input directory inside the archive, even that may not work for you - see the alternatives at the bottom.

    You may be better off with alternatives:


    Solving the problem with direct use of the .NET v4.5+ [System.IO.Compression.ZipFile] class:

    Note:

    • In Windows PowerShell, unlike in PowerShell Core, you most load the relevant assembly manually with Add-Type -AssemblyName System.IO.Compression.FileSystem.

    • Because PowerShell doesn't support implicit use of extension methods as of Windows PowerShell v5.1 / PowerShell Core 6.1.0, you must make explicit use of the [System.IO.Compression.ZipFileExtensions] class as well.

    # Windows PowerShell: must load assembly System.IO.Compression.FileSystem manually.
    Add-Type -AssemblyName System.IO.Compression.FileSystem
    
    # Create the target archive via .NET to provide more control over how files
    # are added.
    # Make sure that the target file doesn't already exist.
    $archive = [System.IO.Compression.ZipFile]::Open(
      "$pwd\doc.archive-$(Get-Date -f yyyyMMdd.hhmmss).zip",
      'Create'
    )
    
    # Get the list of files to archive with their relative paths and
    # add them to the target archive one by one.
    $useAbsolutePaths = $False # Set this to true to use absolute paths instead.
    Get-ChildItem -Recurse -Filter *.docx | ForEach-Object {
        # Determine the entry path, i.e., the archive-internal path.
        $entryPath = (
              ($_.FullName -replace ([regex]::Escape($PWD.ProviderPath) + '[/\\]'), ''), 
              $_.FullName
            )[$useAbsolutePaths]
        $null = [System.IO.Compression.ZipFileExtensions]::CreateEntryFromFile(
          $archive, 
          $_.FullName, 
          $entryPath
        )
      }
    
    # Close the archive.
    $archive.Dispose()