If I am using a PowerShell ForEach-Object statement, and storing contents in a variable, is it possible to dump those contents to a log file every 1000 entries?
I only ask because I am processing files with tens of thousands if not hundreds of thousands of lines, and can take hours to process. If the computer or program crashes, I at least want partial part of it saved to a log file. Outputting every line one at a time seems it will slow the process too.
(EDIT: Added additional content to code, I guess I wasn't doing myself any favors by trying to simplify it)
Example:
$count = 1
Get-ChildItem -Path "$path" -Recurse -File | ForEach-Object {
$fileinfo = $_
$FullName = $fileinfo.FullName -replace [regex]::Escape("$hashpath"), ''
$LastWriteTime = $fileinfo.LastWriteTime.ToString('yyyyMMdd_HHmss)
Write-Host "$count of $numfiles $FullName $($fileinfo.Length) $LastWriteTime"
$count++
$filehash = (Get-FileHash -LiteralPath $fileinfo.FullName -Algorithm SHA256).Hash
"$FileHash $FullName $($fileInfo.Length) $LastWriteTime"
} | Out-File -Encoding UTF8 -FilePath $hashlog
I was considering using a counter, but not sure how I would only capture each 1000 parts. Thanks for any assistance.
EDIT: Output I'm trying to achieve:
[SHA256HASHOUTPUT] [RELATIVE FILE PATH] [FILE SIZE] [DATETIMESTAMP]
0123456789012345678901234567890123456789012345678901234567891234 \file1.txt 345 20231129_130623
You can use a List<T>
to buffer the hashes before outputting to a file, and an anonymous function to handle the logic to output to a file when its .Count
reaches that buffer.
Worth noting:
Format-Table -AutoSize
is not needed at all in your code and is affecting its performance by delaying the output to the file. Most likely something like this will give a very similar performance and no buffering is needed:
Get-ChildItem -Path $path -Recurse -File |
Get-FileHash -Algorithm SHA256 |
ForEach-Object Hash |
Out-File $hashlog -Encoding utf8
.Trim()
the .Hash
value is not needed.
Output from Get-ChildItem
can be piped directly to Get-FileHash
.
Get-ChildItem -Path $path -Recurse -File |
Get-FileHash -Algorithm SHA256 | & {
begin {
# tweak the amount of lines to hold before outputing to file
$buffersize = 1000
$list = [System.Collections.Generic.List[string]]::new($buffersize)
}
process {
# add the Hash to the List
$list.Add($_.Hash)
# if the List size is equal to the buffer size
if ($list.Count -eq $buffersize) {
# output the content to the file
$list.ToArray()
# and clear the list
$list.Clear()
}
}
end {
# if there is any remaining data
if ($list.Count) {
# output it to the file
$list.ToArray()
}
}
} |
Out-File $hashlog -Encoding utf8
For the updated question, if you want to re-use the buffering logic you can approach it this way:
Get-ChildItem -Path $path -Recurse -File -PipelineVariable file |
Get-FileHash -Algorithm SHA256 | & {
begin {
# tweak the amount of lines to hold before outputing to file
$buffersize = 1000
$list = [System.Collections.Generic.List[string]]::new($buffersize)
}
process {
$line = '{0} {1} {2} {3}' -f
$_.Hash,
$file.FullName.Remove(0, $path.Length),
$file.Length,
$file.LastWriteTime.ToString('yyyyMMdd_HHmss')
$list.Add($line)
# if the List size is equal to the buffer size
if ($list.Count -eq $buffersize) {
# output the content to the file
$list.ToArray()
# and clear the list
$list.Clear()
}
}
end {
# if there is any remaining data
if ($list.Count) {
# output it to the file
$list.ToArray()
}
}
} | Out-File -Encoding UTF8 -FilePath $hashlog