Search code examples
powershellforeachdirectory-structure

How to load and process files one at a time, using PowerShell


I got the following script for loading about hundred thousand .doc files and ran a program on them. Based on the output, files are grouped in to folders. I tested the script on a local directory with few files, and it works as expected.

But when loading from the large corpus of files, the script prints "Loading Files...." and stays there. It seems the script is waiting till it loads all the files from the corpus. If this is the case, is there a way to load and process one file at a time?

It would be great if you could comment on efficiency aspect too.

$path = "\\Storage\100kCorpus"
$filter = "*.doc"
$count = 0
Write-Host "Loading files....";
$files = @(get-childitem -recurse -path $path -filter $filter)
Write-Host "files loaded";
foreach ($file in $files) {
    $count ++
    Write-Host "$file.FullName";
    $out = & "D:\Test\doc\Verify.exe" /i:$file.FullName 
    $failed_file_location="D:\Test\doc\2875555\$out";
    if (($out -ne "passed") -and !(Test-Path -path $failed_file_location )){
        [IO.Directory]::CreateDirectory($failed_file_location)
        Copy-Item $file $failed_file_location
    }
}

Write-Host "There are $count files with the pattern $filer in folder $path"

Solution

  • It will work the way you want if you pipe the output of get-childitem, instead of saving it to an array, i.e.

    get-childitem -recurse -path $path -filter $filter | % {
        $file = $_
        $count ++
        # etc ...
    }
    

    Note that $file = $_ is just so you don't have to modify your script too much.

    Efficiency-wise I don't have much to say, except that in this way you are also avoiding to store all the file objects into an array ($files), so this version is at least avoiding an unnecessary operation.