Search code examples
visual-studio-2012ssisssis-2012

SSIS Only continue through flow if one source row count is higher than another


I am producing an SSIS package to update a database from a CSV file, the package will truncate a DB table before inserting all the rows from the CSV file into it.

As a check I want to ensure that there are more rows in the CSV file than are on the database, so when I load these in I am storing the row count for each in two variables, csvCount and dbCount.

enter image description here

I would like to be able to say in the data flow:

IF csvCount > dbCount then
 move onto next flow task
else
 exit package and prevent any other tasks / control flows running
end

but i am unsure what component i can use.

Any help is greatly appreciated

Edit: Image to illustrate tasks after sequence container are run even though condition fails

enter image description here


Solution

  • You'll have to double process your source file. Nothing in a Data Flow can change the state of an SSIS variable until the task is complete.

    Count the table

    So, you can use an Execute SQL Task in the beginning of your package to count the rows in your target table and assign the rowcount to your variable. I favor this query for getting row counts as it's much faster than directly counting the table via SELECT COUNT(*) AS RC FROM MyTABLE;

    SELECT
        s.[Name] as [Schema]
    ,   t.[name] as [Table]
    ,   SUM(p.rows) as [RowCount]
    FROM
        sys.schemas s
        LEFT OUTER JOIN 
            sys.tables t
            ON s.schema_id = t.schema_id
        LEFT OUTER JOIN 
            sys.partitions p
            ON t.object_id = p.object_id
        LEFT OUTER JOIN  
            sys.allocation_units a
            ON p.partition_id = a.container_id
    WHERE
        p.index_id  in(0,1) -- 0 heap table , 1 table with clustered index
        AND p.rows is not null
        AND a.type = 1  -- row-data only , not LOB
        -- ADD YOUR FILTER HERE
    GROUP BY 
        s.[Name]
    ,   t.[name]
    ORDER BY 
        1
    ,   2;
    

    Count the file

    You then add a Data Flow Task, much as you have above but it will only have a Flat File Source connected to a Row Count component.

    As the input of this task, connect the Count the table task to it as usual

    Actual data flow

    Here you have your actual data flow. Flat file source to any transformations to any destinations.

    The magic to make the pseudocode you provided work will be done in the "Precedent Constraint"

    1. Connect the Count the file task to the Actual Data Flow.
    2. Double click the connector line and change the Evaluation Operation from Constraint to Expression and Constraint. In the expression field, add a formula like @[User::RowCountFile] >= @[User::RowCountDB]

    Done.

    Only when that formula is true will the connected item execute.

    Edit

    The follow on question was how to I stop processing in the event of insufficient rows. I'd reorganize your existing workflow to have a Sequence Container that contains the Business Logic for whether it should run (Count the tables + Count the file) and then the second Sequence Container would contain the actual business processing logic (truncate table, import data, send notification)