Search code examples
sqlsql-serverssisssis-2012ssis-2008

How to get desired output in SSIS by grouping by and count?


I'm creating an SSIS package wherein a user needs to be notified if there is a duplicate productcode for one productlabel. We retrieve the products thru a csv flat file being sent to us in a shared location.

In this example I have Productcode = 1a for both productlabel Physio Ormix and Nixen.

So basically the productcode = 1a was a duplicate because it was also used by the productlabel=Nixen. Hence, notify users for the duplicate ProductCode and Productlabel used. I tried creating an aggregate that performs group by ProductCode and Counts it.

In this link is the image of my dataflow

Can someone give me tips on how to do this?

In this link is the desired output


Solution

  • I think you can use a script component and a conditional split to get the duplicates without all this logic:

    1. Inside the data flow task add a Script Component
    2. Add an output column of type DT_BOOL (example name is Flag)
    3. Inside the script component write a similar script:

      using System.Collections.Generic;
      
      public class ScriptMain:  
          UserComponent  
      
      {  
      
          List<string> lstKey = new List<string>;
          List<string> lstKeylabel = new List<string>;
      
          public override void Input0_ProcessInputRow(InputBuffer0 Row)  
          {  
      
              if(!lstKey.Contains(Row.ProductCode){
      
                  lstKey.Add(Row.ProductCode);
                  lstKeylabel.Add(Row.ProductCode + ";" + Row.ProductLabel);
                  Row.Flag = true;
      
              }else if(lstKeylabel.Contains(Row.ProductCode + ";" + Row.ProductLabel)) {
      
                  Row.Flag = true;
      
              }else{
      
                  Row.Flag = false;
      
              }
      
          }  
      
      }
      
    4. Add a conditional split after the script component with a similar expression:

      [Flag] == true
      
    5. All records that are passed thru the true path are unique, all rows passed in the false path are the duplicates.