Search code examples
node.jssql-serverbatch-processing

mssql nodejs operation timed out for an unknown reason


I am trying to run several hundred thousand sql update queries using node/mssql. I am trying to:

  1. insert each record individually (if one fails I don't want the batch to fail)
  2. batch the queries so I don't overload the SQL server (I can open a new connection for every query but the server explodes if I do that)

With my existing code (which works 99% of the time) I occasionally get: operation timed out for an unknown reason and I'm hoping someone can suggest a fix, or improvements.

this is what I have:

try {
    const sql = require("mssql");
    let pool=await new sql.connect(CONFIG_OBJ)
    let batchSize=1000
    let queries=[
       `update xxx set [AwsCoID]='10118' where [PrimaryKey]='10118-78843' IF @@ROWCOUNT=0 insert into xxx([AwsCoID]) values('10118')`,
        `update or insert 2`,
        `update or insert 3`,....]

    for (let i = 0; i < queries.length; i += batchSize) {
      let prom = queries
        .slice(i, i + batchSize)
        .map((qq) => pool.request().query(qq));

      for (let p of await (Promise as any).allSettled(prom)) {
        //make sure connection is still active after batch finishes
        pool=await new sql.connect(cc)
        //console.error(`promerr:`, p);
        let status: "fulfilled" | "rejected" = p.status;
        let value = p.value as SqlResult;
        if (status != "fulfilled" || !value.isSuccess) {
          console.log(`batchRunSqlCommands() promERR:`, value);
          errs.push(value);
        }
      }
    }
  } catch (e) {
    console.log(`batchSqlCommand err:`, e);
  } finally {
    pool.close();
  }

Solution

  • For anyone else who writes something like I did, the issue is that SQL server does a table lock of the affected rows when doing an upsert. The fix is to add a clustered index that ensures each record being updated is in its own cluster, so the cluster gets locked but only one row is modified within the cluster at a time.

    TLDR: set a "line unique" column (eg PrimaryKey) as the clustered index on the table.

    This is not good for DB performance, but will quickly and simply solve the issue. You could also intelligently cluster groups of data, but then you would need to ensure your batch update only touched each cluster once and finished before trying to access it again.