I am producing an SSIS package to update a database from a CSV file, the package will truncate a DB table before inserting all the rows from the CSV file into it.
As a check I want to ensure that there are more rows in the CSV file than are on the database, so when I load these in I am storing the row count for each in two variables, csvCount and dbCount.
I would like to be able to say in the data flow:
IF csvCount > dbCount then
move onto next flow task
else
exit package and prevent any other tasks / control flows running
end
but i am unsure what component i can use.
Any help is greatly appreciated
Edit: Image to illustrate tasks after sequence container are run even though condition fails
You'll have to double process your source file. Nothing in a Data Flow can change the state of an SSIS variable until the task is complete.
So, you can use an Execute SQL Task in the beginning of your package to count the rows in your target table and assign the rowcount to your variable. I favor this query for getting row counts as it's much faster than directly counting the table via SELECT COUNT(*) AS RC FROM MyTABLE;
SELECT
s.[Name] as [Schema]
, t.[name] as [Table]
, SUM(p.rows) as [RowCount]
FROM
sys.schemas s
LEFT OUTER JOIN
sys.tables t
ON s.schema_id = t.schema_id
LEFT OUTER JOIN
sys.partitions p
ON t.object_id = p.object_id
LEFT OUTER JOIN
sys.allocation_units a
ON p.partition_id = a.container_id
WHERE
p.index_id in(0,1) -- 0 heap table , 1 table with clustered index
AND p.rows is not null
AND a.type = 1 -- row-data only , not LOB
-- ADD YOUR FILTER HERE
GROUP BY
s.[Name]
, t.[name]
ORDER BY
1
, 2;
You then add a Data Flow Task, much as you have above but it will only have a Flat File Source connected to a Row Count component.
As the input of this task, connect the Count the table
task to it as usual
Here you have your actual data flow. Flat file source to any transformations to any destinations.
The magic to make the pseudocode you provided work will be done in the "Precedent Constraint"
Count the file
task to the Actual Data Flow. Constraint
to Expression and Constraint
. In the expression field, add a formula like @[User::RowCountFile] >= @[User::RowCountDB]
Done.
Only when that formula is true will the connected item execute.
The follow on question was how to I stop processing in the event of insufficient rows. I'd reorganize your existing workflow to have a Sequence Container that contains the Business Logic for whether it should run (Count the tables + Count the file) and then the second Sequence Container would contain the actual business processing logic (truncate table, import data, send notification)