Search code examples
multithreadingpowershellparallel-processingpowershell-cmdletjson-api

PowerShell script parallelization effort - A job does not wait until it gets an answer from the REST API and finishes the job prematurely


Edited with last version code:

I have a PowerShell script which is doing following steps:

  1. Connect to SQL server, download table of 3 variables (three fields)
  2. Then Connect to REST API (based on three variables) and GET JSON answer
  3. Then save JSON output from API as file one by one, till script reach end of input array pool

This script runs just fine when steps are done in serial setting. Original script with while loop taking input items one be one, connect to API, get answer, save JSON answer to a file.

The problem: However it is enormously slow this way. Each cycle take around 1 minute and i have around 11,000 entries to process. It would take almost 7 days to finish.

The solution: based on this article: text. I have decide to use multi-thread job solution, where I can run multiple jobs in parallel.

The Issue: In this multi-tread solution, it seems, like each standalone JOB is finished prematurely. In few milliseconds whole batch of 11K files are saved, with proper notion of each variable. But all files are empty. I would still expect (even for parallel run), that each JOB last 1 min in average. I am convinced, that script simply does not wait till it gets answer from API and immediately jump to next step save an empty file.

Here is the PowerShell script:

#Connect to SQL server
$SQLServer = "XXXXXXXXXXXXX" #use Server\Instance for named SQL instances
$SQLDBName = "XXX"
$SqlConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlConnection.ConnectionString = "Server = $SQLServer; Database = $SQLDBName; Integrated Security=True;"
$SqlConnection.Open()

$execute_query = New-Object System.Data.SqlClient.SqlCommand
$execute_query.connection = $SqlConnection

$DataSet = Invoke-Sqlcmd -Query "SELECT Input_1 ,Input_2 ,Input_3  FROM XXX.dbo.API_Input"


#Close SQL Connection
$SqlConnection.Close()

$Commnad_Block = {
    Param( [string]$Variable_Input_1, [string]$Variable_Input_2, [string]$Variable_Input_3)
           
      # Compile URL link based on input
    $URL = 'https://example.com/Stag?Seg='+ $Variable_Input_2 +'&Cust='+$Variable_Input_1+'&Prod='+$Variable_Input_3

    # Call REST API GET method, with 3 variables
    $response = Invoke-RestMethod -Uri $URL -Method GET -ContentType "application/json" 

    #Create File name to be exported
    $FileNamePath = 'D:\File_Export\'+$Variable_Input_3+'.json'

    # Store JSON output from API into JSON raw file, 
    $response | ConvertTo-Json -depth 100 | Out-File $FileNamePath
}


#Remove all jobs
Get-Job | Remove-Job
$MaxThreads = 4

#Start the jobs. Max 4 jobs running simultaneously.
foreach($element in $DataSet){
    While ($(Get-Job -state running).count -ge $MaxThreads){
        Start-Sleep -Milliseconds 1
    }
    Start-Job -ScriptblVariable_Input_3k $Commnad_Block -ArgumentList $element.Input_1, $element.Input_2, $element.Input_3
}

#Wait for all jobs to finish.
While ($(Get-Job -State Running).count -gt 0){
    start-sleep 1
}

#Get information from each job.
foreach($job in Get-Job){
    $info= Receive-Job -Id ($job.Id)
}

#Remove all jobs created.
Get-Job | Remove-Job

Attempt to solve the issue: I have tried to use various wait statements, to give each JOB to get answer from API, but nothing seems to help. Still even that would not be desired solution. I would like enforce script to wait till API part is finished. I am clueless. I will emphasize that script it self is running just fine, if it is serialized.


Solution

  • I have been able to solve the issue. By actually pulling error massage into the file. After i got error found on web that problem is not the premature ending of a Job as i thought, but error was due to TLS version used on the server. I just put this line into the right place in the code and now it is working as expected:

    [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
    

    As I mention code was working just fine if it was execute in serialized way. And it is because i had Tls1.2 set in the code but on the top of the code. It seems that multi-thread job execution need to have this line to be inserted inside each parallel loop, not outside.

    Here is full working code:

    #Connect to SQL server
    $SQLServer = "XXXXXXXXXXXXXXXXX" #use Server\Instance for named SQL instances
    $SQLDBName = "XXX"
    $SqlConnection = New-Object System.Data.SqlClient.SqlConnection
    $SqlConnection.ConnectionString = "Server = $SQLServer; Database = $SQLDBName; Integrated Security=True;"
    $SqlConnection.Open()
        
    $execute_query = New-Object System.Data.SqlClient.SqlCommand
    $execute_query.connection = $SqlConnection
        
    $DataSet = Invoke-Sqlcmd -Query "SELECT Var_1 ,Var_2 ,Var_3  FROM dbo.Table WHERE Cntry = 'US'"  
        
    #Close SQL Connection
    $SqlConnection.Close()
        
    $Command_Block = {
      Param(
        [string]$Var_1,
        [string]$Var_2,
        [string]$Var_3
      )
            
      # Compile URL link based on input
      $URL = "https://example.com/Prod?cntry=US&Seg=$Var_2&Drth=$Var_1&Prd=$Var_3"
        
      try {
             
        # On the server only TLS1.2 or higher is allowed, by defualt, PS is try to use TLS1.1,
        # which is disabled on the server
        #####    below line was the solution ######
        [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
        
        # Call REST API GET method, with 3 variables
        $response = Invoke-RestMethod -Uri $URL -Method GET -ContentType "application/json"
                
        # Create File name to be exported
        $FileNamePath = "D:\File_Export\$Var_3.json"
        
        # Store JSON output from API into JSON raw file
        $response | ConvertTo-Json -Depth 100 | Out-File $FileNamePath -Force
      }
      catch {
        $ErrorMessage = "Error occurred while processing URL: $URL `n$($_.Exception.Message)"
        $ErrorFileName = "D:\File_Export\Error_$Var_3.txt"
        $ErrorMessage | Out-File $ErrorFileName -Force
      }
    }
        
        
    # Start the jobs. Max 4 jobs running simultaneously.
    $MaxThreads = 4
    $Jobs = @()
    
    foreach ($element in $DataSet) {
      while ($(Get-Job -State Running).Count -ge $MaxThreads) {
        Start-Sleep -Milliseconds 50
      }
      $Jobs += Start-Job `
        -ScriptBlock $Command_Block `
        -ArgumentList  $element.Var_1, $element.Var_2, $element.Var_3
    }
        
    # Wait for all jobs to finish.
    $Jobs | Wait-Job
        
    # Get information from each job.
    foreach ($job in $Jobs) {
      $info = Receive-Job -Id $job.Id
    }
        
    # Remove all jobs created.
    $Jobs | Remove-Job