Search code examples
sqlsql-servert-sqlssisssis-2012

How to load grouped data with SSIS


I have a tricky flat file data source. The data is grouped, like this:

Country    City
U.S.       New York
           Washington
           Baltimore
Canada     Toronto
           Vancouver

But I want it to be this format when it's loaded in to the database:

Country    City
U.S.       New York
U.S.       Washington
U.S.       Baltimore
Canada     Toronto
Canada     Vancouver

Anyone has met such a problem before? Got a idea to deal with it?
The only idea I got now is to use the cursor, but the it is just too slow.
Thank you!


Solution

  • The answer by cha will work, but here is another in case you need to do it in SSIS without temporary/staging tables:

    You can run your dataflow through a Script Transformation that uses a DataFlow-level variable. As each row comes in the script checks the value of the Country column.

    If it has a non-blank value, then populate the variable with that value, and pass it along in the dataflow.

    If Country has a blank value, then overwrite it with the value of the variable, which will be last non-blank Country value you got.

    EDIT: I looked up your error message and learned something new about Script Components (the Data Flow tool, as opposed to Script Tasks, the Control Flow tool):

    The collection of ReadWriteVariables is only available in the PostExecute method to maximize performance and minimize the risk of locking conflicts. Therefore you cannot directly increment the value of a package variable as you process each row of data. Increment the value of a local variable instead, and set the value of the package variable to the value of the local variable in the PostExecute method after all data has been processed. You can also use the VariableDispenser property to work around this limitation, as described later in this topic. However, writing directly to a package variable as each row is processed will negatively impact performance and increase the risk of locking conflicts.

    That comes from this MSDN article, which also has more information about the Variable Dispenser work-around, if you want to go that route, but apparently I mislead you above when I said you can set the value of the package variable in the script. You have to use a variable that is local to the script, and then change it in the Post-Execute event handler. I can't tell from the article whether that means that you will not be able to read the variable in the script, and if that's the case, then the Variable Dispenser would be the only option. Or I suppose you could create another variable that the script will have read-only access to, and set its value to an expression so that it always has the value of the read-write variable. That might work.