Search code examples
powershellpowershell-2.0

Powershell fastest directory list


Good Afternoon Everyone,

I am working with a Storage Area Network (SAN) that has approximately 10TB of data. I need to perform a recursive directory listing to identify specific types of files (e.g., PST files). Currently, I'm using PowerShell's Get-ChildItem -Include command, but it's exceedingly slow—taking days to complete the task.

Requirement(s):

  1. The solution should significantly reduce the time required for the directory listing, ideally cutting it down to a few hours.
  2. I only require the list of file paths; extracting file properties is not necessary.

Questions:

  1. Are there any more efficient methods or tools (preferably PowerShell or Windows CMD based) that could expedite this recursive directory listing on a large-scale data set?

Side Note(s):

I found a compiled code resource here that seems relevant. Could someone provide guidance on how to implement this in my scenario? Any suggestions or insights on speeding up this process would be greatly appreciated! If anyone could point me in the direction on how to use the compiled code from HERE I should be good too.

Final Result


Thanks to the wonderful @not2qubit for finding the GetFiles method of the [System.IO.Directory] class, we have a significantly faster way to locate files in large directories with a good amount of limiting criteria.

    [System.IO.Directory]::GetFiles(
        'C:\',                                        # [Str] Root Search Directory
        'cmd.exe',                                    # [Str] File Name Pattern
        [System.IO.EnumerationOptions] @{
            AttributesToSkip         = @(
                'Hidden'
                'Device'
                # 'Temporary'
                'SparseFile'
                'ReparsePoint'
                # 'Compressed'
                'Offline'
                'Encrypted'
                'IntegrityStream' 
                # 'NoScrubData'
            )
            BufferSize               = 4096           # [Int]  Default=4096
            IgnoreInaccessible       = $True          # [Bool] True=Ignore Inaccessible Directories
            MatchCasing              = 0              # [Int]  0=PlatformDefault; 1=CaseSensitive; 2=CaseInsensitive
            MatchType                = 0              # [Int]  0=Simple; 1=Advanced
            MaxRecursionDepth        = 2147483647     # [Int]  Default=2147483647
            RecurseSubdirectories    = $True          # [Bool] 
            ReturnSpecialDirectories = $False         # [Bool] $True=Return the special directory entries "." and "..";
        }
    )

Results

[System.IO.Directory]::GetFiles
Maximum Minimum Average
------- ------- -------
5.782s  5.082s  5.385s

Get-Childitem
Maximum Minimum Average
------- ------- -------
21.647s 17.556s 19.907s

Full Test Code


Function Start-PerformanceTest {
    <#
        .SYNOPSIS
            Test the execution time of script blocks.
        .DESCRIPTION
            Perform an accurate measurement of a block of code over a number of itterations allowing informed decisions to be made about code efficency. 
        .PARAMETER ScriptBlock
            [ScriptBlock] Code to run and measure. Input code as either a ScriptBlock object or wrap it in {} and the script will attempt to convert it automatically.
        .PARAMETER Measurement
            [String] Ime interval in which to display measurements. (Options: Milliseconds, Seconds, Minutes, Hours, Days)
        .PARAMETER Itterations
            [Int] Numbers of times to run the code.
        
        .INPUTS
            None
        .OUTPUTS
            None
        .NOTES
        VERSION     DATE            NAME                        DESCRIPTION
        ___________________________________________________________________________________________________________
        1.0         20 August 2020  Warilia, Nicholas R.        Initial version
        Credits:
            (1) Script Template: https://gist.github.com/9to5IT/9620683
    #>

    [CmdletBinding()]
    param (
        [Parameter(Mandatory)]
        [ScriptBlock]$ScriptBlock,
        [ValidateSet('Milliseconds', 'Seconds', 'Minutes', 'Hours', 'Days')]
        $Measurement = 'Seconds',
        [int]$Iterations = 100
    )

    $Results = [System.Collections.ArrayList]::new()

    For ($I = 0; $I -le $Iterations; $I++) {
        [Void]$Results.Add(
            ((Measure-Command -Expression ([scriptblock]::Create($ScriptBlock)) | Select-Object TotalDays, TotalMinutes, TotalSeconds, TotalMilliseconds))
        )
    }

    #Determine correct timestamp label
    Switch ($Measurement) {
        'Milliseconds' { $LengthType = 'ms' }
        default { $LengthType = $Measurement.SubString(0, 1).tolower() }
    }

    $Results | Group-Object Total$Measurement | Measure-Object -Property Name -Average -Maximum -Minimum | Select-Object `
    @{Name = 'Maximum'; Expression = { "$([Math]::Round($_.Maximum,3))$LengthType" } },
    @{Name = 'Minimum'; Expression = { "$([Math]::Round($_.Minimum,3))$LengthType" } },
    @{Name = 'Average'; Expression = { "$([Math]::Round($_.Average,3))$LengthType" } }
}

Write-Host "Testing: System.IO.Directory.GetFiles"
Start-PerformanceTest -Iterations:10 -ScriptBlock:{
    [System.IO.Directory]::GetFiles(
        'C:\',                                        # [Str] Root Search Directory
        'cmd.exe',                                    # [Str] File Name Pattern
        [System.IO.EnumerationOptions] @{
            AttributesToSkip         = @(
                'Hidden'
                'Device'
                # 'Temporary'
                'SparseFile'
                'ReparsePoint'
                # 'Compressed'
                'Offline'
                'Encrypted'
                'IntegrityStream' 
                # 'NoScrubData'
            )
            BufferSize               = 4096           # [Int]  Default=4096
            IgnoreInaccessible       = $True          # [Bool] True=Ignore Inaccessible Directories
            MatchCasing              = 0              # [Int]  0=PlatformDefault; 1=CaseSensitive; 2=CaseInsensitive
            MatchType                = 0              # [Int]  0=Simple; 1=Advanced
            MaxRecursionDepth        = 2147483647     # [Int]  Default=2147483647
            RecurseSubdirectories    = $True          # [Bool] 
            ReturnSpecialDirectories = $False         # [Bool] $True=Return the special directory entries "." and "..";
        }
    )
}

Write-Host 'Testing: Get-ChildItem'
Start-PerformanceTest -Iterations:10 -ScriptBlock:{
    Get-ChildItem -Path:'C:\' -Filter:'cmd.exe' -Recurse -File -ErrorAction SilentlyContinue |
    Where-Object {
        # Filter out files based on specified attributes
        # Note: Some attributes might not directly correspond to EnumerationOptions and need manual filtering
        !($_.Attributes -band [System.IO.FileAttributes]::Hidden) -and
        !($_.Attributes -band [System.IO.FileAttributes]::Device) -and
        !($_.Attributes -band [System.IO.FileAttributes]::SparseFile) -and
        !($_.Attributes -band [System.IO.FileAttributes]::ReparsePoint) -and
        !($_.Attributes -band [System.IO.FileAttributes]::Offline) -and
        !($_.Attributes -band [System.IO.FileAttributes]::Encrypted) -and
        !($_.Attributes -band [System.IO.FileAttributes]::IntegrityStream)
    }
}


Solution

  • If it's just one extension that you're after use the Filter parameter, it's much faster than -Include. I'd also suggest to use PowerShell 3 is you can (get-childitem has the new -file switch), as far as I remember listing UNC paths performance was enhanced in it (with underlying .net 4 support).

    Another option would be to use the dir command from a cmd window, should be very fast.