Search code examples
sqlsql-serverssisetloledbcommand

Does the OLEDB source task pass all the rows to the OLE DB Command task, and the OLE DB Command task processes one record at a time?


I have a data flow task with 2 components:

  1. OLEDB data source tasks that are a SELECT query:
SELECT ACCOUNTID
FROM JOBS
WHERE STATUS=3
  1. OLE DB Command task that does:
DELETE FROM ACCOUNT
WHERE ACCOUNTID=?

The logic is that any job with status id 3 has to result in deletion of the accountid from the account table.

I know that when step 1 returns multiple records, step 2 performs slow because it is an operation that happens on each record. Where as if I had staged the data from step 1 in a separate table and then in an execute sql task fired delete based on the staged table, then it would have been faster.

Since the number of rows returns will always be small (under 20), I am using the OLE DB Command task approach. My question is -

  1. Does the OLEDB source task pass each row into the OLE DB Command task, there by resulting the the OLEDB Command task processing one record at a time?

Or Does the OLEDB source task pass all the rows to the OLE DB Command task, and the OLE DB Command task processes one record at a time?

  1. Once rows are extracted by the OLEDB source task, then where are they held before passing into the OLE DB Command task?

  2. Until the completion of the OLE DB Command task, are the rows produced by the OLEDB Source task locked?


Solution

  • 1st question

    Does the OLEDB source task pass each row into the OLE DB Command task, thereby resulting in the OLEDB Command task processing one record at a time? Or Does the OLEDB source task pass all the rows to the OLE DB Command task, and the OLE DB Command task processes one record at a time?

    The OLE DB source extracts data from the SQL database and stores them within a memory buffer. There are several factors that affect the way that data is loaded into SSIS:

    1. The extracted data rows count
    2. The amount of data stored in each row
    3. The DefaultBufferMaxRows and DefaultBufferSize Data Flow Task's properties values

    enter image description here

    On the other hand, the OLE DB Command process the data row-by-row as mentioned in the official documentation:

    The OLE DB Command transformation runs an SQL statement for each row in a data flow.

    2nd question

    Until the completion of the OLE DB Command task, are the rows produced by the OLEDB Source task locked?

    As mentioned before, data is transferred within Memory buffers. Once the OLE DB command processed rows count and size meet the Data Flow Task's Buffer configuration, they are sent to the next component. For sure, this is slower than the data extraction from the OLE DB Source, since data is processed row-by-row.

    enter image description here

    Besides, while processing data, the OLE DB command does not receive any new data from the OLE DB Source. It is not like the Sort Transformation. It only receives one buffer, processes it, and sends it via output instead of storing all data in memory and processing them.

    The following screenshot shows how an OLE DB component does not receive a new data buffer before that it ends processing the current one (I am using an update statement over a huge table without indexes in this example):

    enter image description here


    Helpful resources