I've made a PowerShell script that runs robocopy with md5 checks.
It works fine, but if I try to exclude some dirs or files, robocopy handles the exclusion, whereas MD5 part of script that compares the hash, doesn't work, returns some errors because the source has more files/hash than destination...
I've tried, maybe, all methods that I've found here and on the Internet! I can't exclude dirs and/or files from a path!
Below is what I've done so far. In this mode, the md5-copy works (without exclusions):
$Source = "F:\"
$IgnoreDir = @(
$Source + '$RECYCLE.BIN'
$Source + "System Volume Information"
$Source + "VMs"
)
$IgnoreFile = @(
$Source + "SHDrive.vmdk"
$Source + "SHDrive-flat.vmdk"
)
$Ignored = $IgnoreDir + $IgnoreFile
Robocopy:
Robocopy.exe /R:1 /W:0 $Source $Dest /E /V /TEE /XD $IgnoreDir /XF $IgnoreFile /LOG:$LogDir\RBCY_MD5_F.txt
MD5:
$SourceHash = Get-ChildItem "$Source\*.*" -Recurse -Force -Exclude $Ignored | Where-Object {!$_.psiscontainer } | Get-FileHash
$SourceHash | Select-Object "Hash", "path" | ft -HideTableHeaders -AutoSize | Out-File -Width "300" $LogDir\SRC_MD5_REF.txt
$SourceHash.Hash | Out-File $LogDir\SRC_MD5.txt
Comparing:
$Diff = Compare-Object -ReferenceObject $(get-content "$LogDir\SRC_MD5.txt") -DifferenceObject $(get-content "$LogDir\DST_MD5.txt")
Content of F:\ drive:
PS C:\Users\Robbi> Get-ChildItem F:\ -force
Directory: F:\
Mode LastWriteTime Length Name
---- ------------- ------ ----
d--hs- 19/03/2019 06:40 $RECYCLE.BIN
d----- 16/05/2020 04:41 DATA
d----- 19/01/2020 06:34 Drivers
d----- 16/05/2020 04:55 Gumball
d----- 16/05/2020 04:58 SW
d--hs- 19/03/2019 06:36 System Volume Information
d----- 13/03/2020 16:08 Tools
d----- 12/12/2019 00:02 VMs
d----- 16/05/2020 04:55 _Pre-Cestino
-a---- 08/02/2020 03:02 21474836480 SHDrive-flat.vmdk
-a---- 08/02/2020 03:02 466 SHDrive.vmdk
How can I exclude the data I don't want to copy from the get-children list? In this specific case and, if possible, in "all cases" in which Get-ChildItem has to exclude an explicit content list (variable string and/or an array) in a whole file system.
As of PowerShell 7.3.x, the -Exclude
and -Include
provider parameters of cmdlets such as Get-ChildItem
only operate on item names (file / directory names, in the case of the file-system provider), not full paths or directory subtrees.
-ExcludeRecursive
.Given that all paths that you want to exclude are direct children of the target directory, I suggest a two-step approach:
# Get all files and directories in $Source, except those to be excluded.
# Note the use of \* instead of \*.*, so as to also include the
# directories (whose names don't have an extension).
$items = Get-Item $Source\* -Force | Where-Object FullName -NotIn $Ignored
# Recursively process all resulting files and directories and
# calculate their hashes.
# Note the use of -File to limit output to files.
$SourceHash = $items | Get-ChildItem -Recurse -Force -File | Get-FileHash
Of course, if you define your $Ignored
array in terms of file/directory names only, you could use -Exclude
:
# Convert the ignore list to file/directory names only.
$Ignored = $Ignored | Split-Path -Leaf
$SourceHash = Get-ChildItem -File $Source -Recurse -Force -Exclude $Ignored |
Get-FileHash
If the names to exclude can occur at any level of the subdirectory hierarchy, more work is needed:
$ignoredRegex = '(?<=^|\{0})({1})(?=\{0}|$)' -f
[IO.Path]::DirectorySeparatorChar,
($Ignored.ForEach({ [regex]::Escape($_) }) -join '|')
$SourceHash = Get-ChildItem $Source -Recurse -File -Force |
Where-Object FullName -notmatch $ignoredRegex |
Get-FileHash
The above uses a regular expression with the (negated form of the) -match
operator to exclude all specified items and their children, recursively, anywhere in the subdirectory tree.