Search code examples
powershellget-childitemget-filehash

How to exclude files and folders from Get-ChildItem in PowerShell?


I've made a PowerShell script that runs robocopy with md5 checks.

It works fine, but if I try to exclude some dirs or files, robocopy handles the exclusion, whereas MD5 part of script that compares the hash, doesn't work, returns some errors because the source has more files/hash than destination...

I've tried, maybe, all methods that I've found here and on the Internet! I can't exclude dirs and/or files from a path!

Below is what I've done so far. In this mode, the md5-copy works (without exclusions):

$Source = "F:\"

$IgnoreDir = @(
    $Source + '$RECYCLE.BIN'
    $Source + "System Volume Information"
    $Source + "VMs"
)   
$IgnoreFile = @(
    $Source + "SHDrive.vmdk"
    $Source + "SHDrive-flat.vmdk"
)
$Ignored = $IgnoreDir + $IgnoreFile

Robocopy:

Robocopy.exe /R:1 /W:0 $Source $Dest /E /V /TEE /XD $IgnoreDir /XF $IgnoreFile /LOG:$LogDir\RBCY_MD5_F.txt

MD5:

$SourceHash = Get-ChildItem "$Source\*.*" -Recurse -Force -Exclude $Ignored | Where-Object {!$_.psiscontainer } | Get-FileHash
$SourceHash | Select-Object "Hash", "path" | ft -HideTableHeaders -AutoSize | Out-File -Width "300" $LogDir\SRC_MD5_REF.txt
$SourceHash.Hash | Out-File $LogDir\SRC_MD5.txt 

Comparing:

$Diff = Compare-Object -ReferenceObject $(get-content "$LogDir\SRC_MD5.txt") -DifferenceObject $(get-content "$LogDir\DST_MD5.txt")

Content of F:\ drive:

PS C:\Users\Robbi> Get-ChildItem F:\ -force


    Directory: F:\


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d--hs-       19/03/2019     06:40                $RECYCLE.BIN
d-----       16/05/2020     04:41                DATA
d-----       19/01/2020     06:34                Drivers
d-----       16/05/2020     04:55                Gumball
d-----       16/05/2020     04:58                SW
d--hs-       19/03/2019     06:36                System Volume Information
d-----       13/03/2020     16:08                Tools
d-----       12/12/2019     00:02                VMs
d-----       16/05/2020     04:55                _Pre-Cestino
-a----       08/02/2020     03:02    21474836480 SHDrive-flat.vmdk
-a----       08/02/2020     03:02            466 SHDrive.vmdk

How can I exclude the data I don't want to copy from the get-children list? In this specific case and, if possible, in "all cases" in which Get-ChildItem has to exclude an explicit content list (variable string and/or an array) in a whole file system.


Solution

  • As of PowerShell 7.3.x, the -Exclude and -Include provider parameters of cmdlets such as Get-ChildItem only operate on item names (file / directory names, in the case of the file-system provider), not full paths or directory subtrees.

    Given that all paths that you want to exclude are direct children of the target directory, I suggest a two-step approach:

    # Get all files and directories in $Source, except those to be excluded.
    # Note the use of \* instead of \*.*, so as to also include the
    # directories (whose names don't have an extension).
    $items = Get-Item $Source\* -Force | Where-Object FullName -NotIn $Ignored
    
    # Recursively process all resulting files and directories and
    # calculate their hashes.
    # Note the use of -File to limit output to files.
    $SourceHash = $items | Get-ChildItem -Recurse -Force -File | Get-FileHash
    

    Of course, if you define your $Ignored array in terms of file/directory names only, you could use -Exclude:

    # Convert the ignore list to file/directory names only.
    $Ignored = $Ignored | Split-Path -Leaf
    
    $SourceHash = Get-ChildItem -File $Source -Recurse -Force -Exclude $Ignored |
                    Get-FileHash
    

    If the names to exclude can occur at any level of the subdirectory hierarchy, more work is needed:

    $ignoredRegex = '(?<=^|\{0})({1})(?=\{0}|$)' -f
                      [IO.Path]::DirectorySeparatorChar,
                      ($Ignored.ForEach({ [regex]::Escape($_) }) -join '|')
    
    
    $SourceHash = Get-ChildItem $Source -Recurse -File -Force |
                    Where-Object FullName -notmatch $ignoredRegex |
                    Get-FileHash
    

    The above uses a regular expression with the (negated form of the) -match operator to exclude all specified items and their children, recursively, anywhere in the subdirectory tree.