Good Afternoon Everyone,
I am working with a Storage Area Network (SAN) that has approximately 10TB of data. I need to perform a recursive directory listing to identify specific types of files (e.g., PST files). Currently, I'm using PowerShell's Get-ChildItem -Include command, but it's exceedingly slow—taking days to complete the task.
I found a compiled code resource here that seems relevant. Could someone provide guidance on how to implement this in my scenario?
Any suggestions or insights on speeding up this process would be greatly appreciated! If anyone could point me in the direction on how to use the compiled code from HERE I should be good too.
Thanks to the wonderful @not2qubit for finding the GetFiles method of the [System.IO.Directory]
class, we have a significantly faster way to locate files in large directories with a good amount of limiting criteria.
[System.IO.Directory]::GetFiles(
'C:\', # [Str] Root Search Directory
'cmd.exe', # [Str] File Name Pattern
[System.IO.EnumerationOptions] @{
AttributesToSkip = @(
'Hidden'
'Device'
# 'Temporary'
'SparseFile'
'ReparsePoint'
# 'Compressed'
'Offline'
'Encrypted'
'IntegrityStream'
# 'NoScrubData'
)
BufferSize = 4096 # [Int] Default=4096
IgnoreInaccessible = $True # [Bool] True=Ignore Inaccessible Directories
MatchCasing = 0 # [Int] 0=PlatformDefault; 1=CaseSensitive; 2=CaseInsensitive
MatchType = 0 # [Int] 0=Simple; 1=Advanced
MaxRecursionDepth = 2147483647 # [Int] Default=2147483647
RecurseSubdirectories = $True # [Bool]
ReturnSpecialDirectories = $False # [Bool] $True=Return the special directory entries "." and "..";
}
)
[System.IO.Directory]::GetFiles
Maximum Minimum Average
------- ------- -------
5.782s 5.082s 5.385s
Get-Childitem
Maximum Minimum Average
------- ------- -------
21.647s 17.556s 19.907s
Function Start-PerformanceTest {
<#
.SYNOPSIS
Test the execution time of script blocks.
.DESCRIPTION
Perform an accurate measurement of a block of code over a number of itterations allowing informed decisions to be made about code efficency.
.PARAMETER ScriptBlock
[ScriptBlock] Code to run and measure. Input code as either a ScriptBlock object or wrap it in {} and the script will attempt to convert it automatically.
.PARAMETER Measurement
[String] Ime interval in which to display measurements. (Options: Milliseconds, Seconds, Minutes, Hours, Days)
.PARAMETER Itterations
[Int] Numbers of times to run the code.
.INPUTS
None
.OUTPUTS
None
.NOTES
VERSION DATE NAME DESCRIPTION
___________________________________________________________________________________________________________
1.0 20 August 2020 Warilia, Nicholas R. Initial version
Credits:
(1) Script Template: https://gist.github.com/9to5IT/9620683
#>
[CmdletBinding()]
param (
[Parameter(Mandatory)]
[ScriptBlock]$ScriptBlock,
[ValidateSet('Milliseconds', 'Seconds', 'Minutes', 'Hours', 'Days')]
$Measurement = 'Seconds',
[int]$Iterations = 100
)
$Results = [System.Collections.ArrayList]::new()
For ($I = 0; $I -le $Iterations; $I++) {
[Void]$Results.Add(
((Measure-Command -Expression ([scriptblock]::Create($ScriptBlock)) | Select-Object TotalDays, TotalMinutes, TotalSeconds, TotalMilliseconds))
)
}
#Determine correct timestamp label
Switch ($Measurement) {
'Milliseconds' { $LengthType = 'ms' }
default { $LengthType = $Measurement.SubString(0, 1).tolower() }
}
$Results | Group-Object Total$Measurement | Measure-Object -Property Name -Average -Maximum -Minimum | Select-Object `
@{Name = 'Maximum'; Expression = { "$([Math]::Round($_.Maximum,3))$LengthType" } },
@{Name = 'Minimum'; Expression = { "$([Math]::Round($_.Minimum,3))$LengthType" } },
@{Name = 'Average'; Expression = { "$([Math]::Round($_.Average,3))$LengthType" } }
}
Write-Host "Testing: System.IO.Directory.GetFiles"
Start-PerformanceTest -Iterations:10 -ScriptBlock:{
[System.IO.Directory]::GetFiles(
'C:\', # [Str] Root Search Directory
'cmd.exe', # [Str] File Name Pattern
[System.IO.EnumerationOptions] @{
AttributesToSkip = @(
'Hidden'
'Device'
# 'Temporary'
'SparseFile'
'ReparsePoint'
# 'Compressed'
'Offline'
'Encrypted'
'IntegrityStream'
# 'NoScrubData'
)
BufferSize = 4096 # [Int] Default=4096
IgnoreInaccessible = $True # [Bool] True=Ignore Inaccessible Directories
MatchCasing = 0 # [Int] 0=PlatformDefault; 1=CaseSensitive; 2=CaseInsensitive
MatchType = 0 # [Int] 0=Simple; 1=Advanced
MaxRecursionDepth = 2147483647 # [Int] Default=2147483647
RecurseSubdirectories = $True # [Bool]
ReturnSpecialDirectories = $False # [Bool] $True=Return the special directory entries "." and "..";
}
)
}
Write-Host 'Testing: Get-ChildItem'
Start-PerformanceTest -Iterations:10 -ScriptBlock:{
Get-ChildItem -Path:'C:\' -Filter:'cmd.exe' -Recurse -File -ErrorAction SilentlyContinue |
Where-Object {
# Filter out files based on specified attributes
# Note: Some attributes might not directly correspond to EnumerationOptions and need manual filtering
!($_.Attributes -band [System.IO.FileAttributes]::Hidden) -and
!($_.Attributes -band [System.IO.FileAttributes]::Device) -and
!($_.Attributes -band [System.IO.FileAttributes]::SparseFile) -and
!($_.Attributes -band [System.IO.FileAttributes]::ReparsePoint) -and
!($_.Attributes -band [System.IO.FileAttributes]::Offline) -and
!($_.Attributes -band [System.IO.FileAttributes]::Encrypted) -and
!($_.Attributes -band [System.IO.FileAttributes]::IntegrityStream)
}
}
If it's just one extension that you're after use the Filter parameter, it's much faster than -Include. I'd also suggest to use PowerShell 3 is you can (get-childitem has the new -file switch), as far as I remember listing UNC paths performance was enhanced in it (with underlying .net 4 support).
Another option would be to use the dir command from a cmd window, should be very fast.