Search code examples
powershellforeach

In a PowerShell ForEach-Object loop is it possible to dump contents to a log file every 1000 entries?


If I am using a PowerShell ForEach-Object statement, and storing contents in a variable, is it possible to dump those contents to a log file every 1000 entries?

I only ask because I am processing files with tens of thousands if not hundreds of thousands of lines, and can take hours to process. If the computer or program crashes, I at least want partial part of it saved to a log file. Outputting every line one at a time seems it will slow the process too.

(EDIT: Added additional content to code, I guess I wasn't doing myself any favors by trying to simplify it)

Example:

$count = 1
Get-ChildItem -Path "$path" -Recurse -File | ForEach-Object {
    $fileinfo = $_
    $FullName = $fileinfo.FullName -replace [regex]::Escape("$hashpath"), ''
    $LastWriteTime = $fileinfo.LastWriteTime.ToString('yyyyMMdd_HHmss)
    Write-Host "$count of $numfiles $FullName $($fileinfo.Length) $LastWriteTime"
    $count++
    $filehash = (Get-FileHash -LiteralPath $fileinfo.FullName -Algorithm SHA256).Hash
    "$FileHash $FullName $($fileInfo.Length) $LastWriteTime"
} | Out-File -Encoding UTF8 -FilePath $hashlog

I was considering using a counter, but not sure how I would only capture each 1000 parts. Thanks for any assistance.

EDIT: Output I'm trying to achieve:

[SHA256HASHOUTPUT] [RELATIVE FILE PATH] [FILE SIZE] [DATETIMESTAMP]
0123456789012345678901234567890123456789012345678901234567891234 \file1.txt 345 20231129_130623

Solution

  • You can use a List<T> to buffer the hashes before outputting to a file, and an anonymous function to handle the logic to output to a file when its .Count reaches that buffer.

    Worth noting:

    • Format-Table -AutoSize is not needed at all in your code and is affecting its performance by delaying the output to the file. Most likely something like this will give a very similar performance and no buffering is needed:

      Get-ChildItem -Path $path -Recurse -File |
          Get-FileHash -Algorithm SHA256 |
          ForEach-Object Hash |
          Out-File $hashlog -Encoding utf8
      
    • .Trim() the .Hash value is not needed.

    • Output from Get-ChildItem can be piped directly to Get-FileHash.

    Get-ChildItem -Path $path -Recurse -File |
        Get-FileHash -Algorithm SHA256 | & {
            begin {
                # tweak the amount of lines to hold before outputing to file
                $buffersize = 1000
                $list = [System.Collections.Generic.List[string]]::new($buffersize)
            }
            process {
                # add the Hash to the List
                $list.Add($_.Hash)
                # if the List size is equal to the buffer size
                if ($list.Count -eq $buffersize) {
                    # output the content to the file
                    $list.ToArray()
                    # and clear the list
                    $list.Clear()
                }
            }
            end {
                # if there is any remaining data
                if ($list.Count) {
                    # output it to the file
                    $list.ToArray()
                }
            }
        } |
        Out-File $hashlog -Encoding utf8
    

    For the updated question, if you want to re-use the buffering logic you can approach it this way:

    Get-ChildItem -Path $path -Recurse -File -PipelineVariable file |
        Get-FileHash -Algorithm SHA256 | & {
            begin {
                # tweak the amount of lines to hold before outputing to file
                $buffersize = 1000
                $list = [System.Collections.Generic.List[string]]::new($buffersize)
            }
            process {
                $line = '{0} {1} {2} {3}' -f
                    $_.Hash,
                    $file.FullName.Remove(0, $path.Length),
                    $file.Length,
                    $file.LastWriteTime.ToString('yyyyMMdd_HHmss')
    
                $list.Add($line)
    
                # if the List size is equal to the buffer size
                if ($list.Count -eq $buffersize) {
                    # output the content to the file
                    $list.ToArray()
                    # and clear the list
                    $list.Clear()
                }
            }
            end {
                # if there is any remaining data
                if ($list.Count) {
                    # output it to the file
                    $list.ToArray()
                }
            }
        } | Out-File -Encoding UTF8 -FilePath $hashlog