Search code examples
sqlstringpowershellsqlitestringbuilder

Powershell poor StringBuilder performance?


I'm trying to build queries so I can insert large amounts of data into an sqlite3 table. I've tried several methods of doing this, including PSSQLite which should be able to take a DataTable and insert it with ease. Even with only 10,000 records it takes nearly 40 minutes to run and I have no clue why.

My other option is to build a query and execute it using another method such as Invoke-Sqlcmd. I've tried doing this with a StringBuilder except that it's taking over 2 minutes to build the string. Like I said, it's only 10,000 records so from what I've ready it should only take 10-15 seconds TOPS. Considering I have, at minimum, several million records to import I really need this to go faster.

Here is the code I'm using. Am I just missing something here?

$sb = [System.Text.StringBuilder]::new()
$sb.AppendLine("BEGIN TRANSACTION")
foreach ($document in $documents) {
   
  $null = $sb.AppendLine("INSERT or IGNORE into documents(DocId,Submid,docno,Tray,Pieceno,CreateDate,Account,AccName,AccAddr1,AccAddr2,AccAddr3,AccAddr4,AccCity,AccState,AccZip,BCDP,BarcodeID,ServTypeID,Mailerid,SerialNo,Sys_Name,Sys_Addr1,Sys_Addr2,Sys_Addr3,Sys_Addr4,Sys_City,Sys_State,Sys_Zip)");
  $null = $sb.AppendLine("VALUES('$($document.DocId)','$($document.Submid)','$($document.docno)','$($document.Tray)','$($document.Pieceno)','$($document.CreateDate)','$($document.Account)','$($document.AccName)','$($document.AccAddr1)','$($document.AccAddr2)','$($document.AccAddr3)','$($document.AccAddr4)','$($document.AccCity)','$($document.AccState)','$($document.AccZip)','$($document.BCDP)','$($document.BarcodeID)','$($document.ServTypeID)','$($document.Mailerid)','$($document.SerialNo)','$($document.Sys_Name)','$($document.Sys_Addr1)','$($document.Sys_Addr2)','$($document.Sys_Addr3)','$($document.Sys_Addr4)','$($document.Sys_City)','$($document.Sys_State)','$($document.Sys_Zip)')")
    
}
    
$sb.AppendLine("COMMIT")
$query = $sb.ToString();
#Invoke-SqliteQuery $ref_db $query #commenting this out, because I haven't even attempted the insert because StringBuilder is not optimized enough yet.

In this instance, $documents is a generic object containing each of the fields in the INSERT statement. Most of the fields are populated with a string, which some are blank.

#EDIT: I'm running this in Powershell ISE with breakpoints set, would that cause the performance issues?


Solution

  • Before getting to the StringBuilder aspect of your question, let's look at SQLite:

    I'd expect you to see a difference in processing time if you change the SQL to commit a single multi-row INSERT statement, instead of 10000 separate INSERT's, as you're doing now - in other words:

    $null = $sb.AppendLine("INSERT or IGNORE into documents(DocId,Submid,docno,Tray,Pieceno,CreateDate,Account,AccName,AccAddr1,AccAddr2,AccAddr3,AccAddr4,AccCity,AccState,AccZip,BCDP,BarcodeID,ServTypeID,Mailerid,SerialNo,Sys_Name,Sys_Addr1,Sys_Addr2,Sys_Addr3,Sys_Addr4,Sys_City,Sys_State,Sys_Zip)")
    $null = $sb.AppendLine("VALUES")
    
    foreach ($document in $documents) {
      # Add separate value tuple for each document, add trailing `,`
      $null = $sb.AppendLine("('$($document.DocId)','$($document.Submid)','$($document.docno)','$($document.Tray)','$($document.Pieceno)','$($document.CreateDate)','$($document.Account)','$($document.AccName)','$($document.AccAddr1)','$($document.AccAddr2)','$($document.AccAddr3)','$($document.AccAddr4)','$($document.AccCity)','$($document.AccState)','$($document.AccZip)','$($document.BCDP)','$($document.BarcodeID)','$($document.ServTypeID)','$($document.Mailerid)','$($document.SerialNo)','$($document.Sys_Name)','$($document.Sys_Addr1)','$($document.Sys_Addr2)','$($document.Sys_Addr3)','$($document.Sys_Addr4)','$($document.Sys_City)','$($document.Sys_State)','$($document.Sys_Zip)'),")
    }
        
    # trim trailing newline + comma on last insert value before adding COMMIT statement
    $query = $sb.ToString().TrimEnd("`r`n,") + "`r`nCOMMIT"
    Invoke-SqliteQuery $ref_db $query
    

    There's a minuscule optimization to be had by avoiding expandable strings and instead using $sb.AppendFormat(), ie.:

    $sb = [System.Text.StringBuilder]::new()
    $sb.AppendLine("BEGIN TRANSACTION")
    $null = $sb.AppendFormat('VALUES ({0}, {2}, ...)', $doc.DocId, $doc.SubmId, ...).AppendLine()
    

    ... but that's likely not the issue.

    In Windows PowerShell, string manipulation (whether through direct concatenation or with a string build) have some very funky performance characteristics as soon as the size exceeds the threshold for the Large Object Heap cache (85Kb) in .NET Framework.

    This does not seem to occur in .NET Core, so upgrading to a newer version of PowerShell (such as PowerShell 7) might prove to eliminate this problem all together.

    If you need to target Windows PowerShell, you might want to simply write your SQL script directly to disk and read it back with Invoke-SqliteQuery -InputFile:

    $scriptFile = New-Item import.sql
    
    try{
      $fileWriter = $scriptFile.CreateText()
      $fileWriter.WriteLine("INSERT or IGNORE into documents(DocId,Submid,docno,Tray,Pieceno,CreateDate,Account,AccName,AccAddr1,AccAddr2,AccAddr3,AccAddr4,AccCity,AccState,AccZip,BCDP,BarcodeID,ServTypeID,Mailerid,SerialNo,Sys_Name,Sys_Addr1,Sys_Addr2,Sys_Addr3,Sys_Addr4,Sys_City,Sys_State,Sys_Zip)")
    
      foreach ($document in $documents) {
        $fileWriter.WriteLine("VALUES")
        $fileWriter.WriteLine("('$($document.DocId)','$($document.Submid)','$($document.docno)','$($document.Tray)','$($document.Pieceno)','$($document.CreateDate)','$($document.Account)','$($document.AccName)','$($document.AccAddr1)','$($document.AccAddr2)','$($document.AccAddr3)','$($document.AccAddr4)','$($document.AccCity)','$($document.AccState)','$($document.AccZip)','$($document.BCDP)','$($document.BarcodeID)','$($document.ServTypeID)','$($document.Mailerid)','$($document.SerialNo)','$($document.Sys_Name)','$($document.Sys_Addr1)','$($document.Sys_Addr2)','$($document.Sys_Addr3)','$($document.Sys_Addr4)','$($document.Sys_City)','$($document.Sys_State)','$($document.Sys_Zip)')")
        $fileWriter.WriteLine("")
      }
      $fileWriter.WriteLine("COMMIT")
    }
    finally{
      $fileWriter.Close()
    }    
    
    Invoke-SqliteQuery $ref_db -InputFile import.sql