Search code examples
excelpowershellparallel-processingcomobject

Use PowerShell to open multiple Excel workbooks in parallel


I have a script that opens a set of Excel workbooks in a single Excel instance and updates connections and pivot caches. The end user reports that they usually open all the workbooks and refresh connections in parallel (as opposed to "in series" as the script does). It looks like parallel processing is possible with PowerShell but so far I'm unable to get Excel Workbooks to open using the script block and Start-Job method. All that happens is that the Excel instance is launched and the job runs forever without completing and no workbooks open.

Best effort so far:

$filePath = "C:\testpath"

#Array: FileName, SaveAs
$files = @()
$files += , @("tst1.xlsx", $false)
$files += , @("tst2.xlsx", $false)

Write-output "$(Get-Date -Format yyyy-MM-dd.hh:mm:ss): Starting loop"
$x1 = New-Object -ComObject "Excel.Application"
$x1.Visible = $true

ForEach($file in $files) {
    
    # Define what each job does
    $ScriptBlock = {

        param([Object[]]$x1,
              [string]$filePath,
              [array]$file) 

        $wb = $x1.workbooks.Open("$filePath\$($file[0])")
            
        ForEach ($cache in $wb.PivotCaches()) {
            #Refresh this cache
            $cache.Refresh()
        }
    }
        
    # Execute the jobs in parallel
    Start-Job $ScriptBlock -ArgumentList $x1, 
                                         $filePath,
                                         $file
}

Get-Job
 
# Wait for it all to complete
While (Get-Job -State "Running") {
    # Pause for 10 seconds
    Start-Sleep 10
}
 
# Getting the information back from the jobs
Get-Job | Receive-Job

Write-output "$(Get-Date -Format yyyy-MM-dd.hh:mm:ss):Loop complete"

All of this works just fine without parallel processing:

$filePath = "C:\testpath"

#Array: FileName, SaveAs
$files = @()
$files += , @("tst1.xlsx", $false)
$files += , @("tst2.xlsx", $false)

Write-output "$(Get-Date -Format yyyy-MM-dd.hh:mm:ss): Starting loop"
$x1 = New-Object -ComObject "Excel.Application"
$x1.Visible = $true

ForEach($file in $files) {
    
    $wb = $x1.workbooks.Open("$filePath\$($file[0])")
            
    ForEach ($cache in $wb.PivotCaches()) {
        #Refresh this cache
        $cache.Refresh()
    }
}

Clues on how to get Excel to launch the workbook within the script block? I put a code break in on the $wb = $x1.workbooks.Open("$filePath\$($file[0])") and the code doesn't stop, it just runs forever. This is the output returned:

Id Name PSJobTypeName State HasMoreData Location Command
1 Job1 BackgroundJob Running True localhost ...
3 Job3 BackgroundJob Running True localhost ...
1 Job1 BackgroundJob Running True localhost ...
3 Job3 BackgroundJob Running True localhost ...

Solution

  • I hope this is what your looking for:

    #Define the Filepath
    $filePath = "C:\testpath\"
    
    #Define the Files
    $files = @("tst1.xlsx", "tst2.xlsx")
    
    #Clear the workbootPaths Array
    $workbookPaths = @()
    
    foreach ($file in $files) {
        #Create an PSobject that hold the filepath and the state (state is alway $false)
        $wbary = New-Object -TypeName psobject -Property @{
            Filepath = $filepath + $file
            State    = $false
        }
        #Add the PSObject to the $workbookpaths array
        $workbookPaths += $wbary
    }
    
    # Define the Job
    $scriptBlock = {
        param(
            [string]$filePath,
            [string]$workbookPath
        )
    
        $x1 = New-Object -ComObject Excel.Application
        $x1.Visible = $true
    
        $wb = $x1.Workbooks.Open($workbookPath)
    
        foreach ($cache in $wb.PivotCaches()) {
            $cache.Refresh()
        }
    }
    
    Write-output "$(Get-Date -Format yyyy-MM-dd.hh:mm:ss): Starting loop"
    
    #Than start the job for each file
    ForEach ($file in $workbookPaths) {        
        Start-Job -ScriptBlock $scriptBlock -ArgumentList $filePath, $file.Filepath
    }
    
    Get-Job
    
    # Wait for it all to complete
    While (Get-Job -State "Running") {
        # Pause for 1 seconds
        Start-Sleep 1
    }
     
    # Getting the information back from the jobs
    Get-Job | Receive-Job
    
    Write-output "$(Get-Date -Format yyyy-MM-dd.hh:mm:ss): Loop complete"