Search code examples
sql-serverssissql-server-data-tools

SSIS SCD wizard does a select * on my data


Visual studio hangs on me when using the Slowly Changing Dimension Wizard.

I select the correct connection. Then I try to open the dropdown 'Table or view' to select a destination table. At this moment visual studio hangs on me..

I have this on all client machines and on different visual studio versions and only on this specific database. In activity monitor I noticed that the wizard does a select * on all tables in the database... I have one table that has +4billion rows (+300GB). It is the select * on this table that takes so long.

Does anybody have any idea what causes the select * on my database, or why they are doing this? And even better, how to fix this?


Solution

  • Don't use the slowly changing dimension wizard in SSIS at all. The data flow it creates performs really badly compared to what you can write with TSQL.

    A couple of assumption; you need a type 2 SCD and you are using at least SQL Server 2008 with MERGE statements available.

    Instead of SSIS use the OUTPUT clause of the MERGE statement within TSQL to perform the dimension update/insert. For example:

    INSERT INTO Customer_Master
    SELECT 
        Source_Cust_ID, 
        First_Name, 
        Last_Name, 
        Eff_Date, 
        End_Date, 
        Current_Flag
    FROM
        ( 
        MERGE 
            Customer_Master CM
        USING 
            Customer_Source CS
        ON 
            CM.Source_Cust_ID = CS.Source_Cust_ID
        WHEN NOT MATCHED 
        THEN
            INSERT VALUES 
                (
                CS.Source_Cust_ID, 
                CS.First_Name, 
                CS.Last_Name, 
                CONVERT(char(10), GETDATE()-1, 101),
                '12/31/2199',
                'y'
                )
        WHEN MATCHED 
            AND CM.Current_Flag = 'y'
            AND (CM.Last_Name <> CS.Last_Name )
        THEN
            UPDATE 
            SET 
                CM.Current_Flag = 'n',
                CM.End_date = convert(char(10), getdate()- 2, 101)
        OUTPUT 
            $Action Action_Out,
            CS.Source_Cust_ID,
            CS.First_Name,
            CS.Last_Name,
            convert(char(10), getdate()-1, 101) Eff_Date,
            '12/31/2199' End_Date,
            'y' Current_Flag
        ) AS MERGE_OUT
    WHERE 
        MERGE_OUT.Action_Out = 'UPDATE';
    

    Source: http://www.kimballgroup.com/2008/11/design-tip-107-using-the-sql-merge-statement-for-slowly-changing-dimension-processing/