Search code examples
sql-serverssisssis-2012

SSIS 2012 - conditional headers' location


I need to stage a couple of thousands of txt files, using SSIS, into an SQL Server 2012 instance.

The issue is that some of the txt files have control checksums in their first two lines and some others don't.

If a file has a checksum, its first line starts with "HEADER" - in such case I have to skip the first two lines then read column headers from line #3 and load data starting from line #4.

If a file does not have a checksum, column names will exist in line #1 and data starts at line #2.

Now, my question is: what is the least invasive way of dealing with such scenario? I am lazy so I am looking for minimum effort with maximum effects.

My ideas so far:

  1. Detect existence of checksums with a C# scripting component and build two separate flows, one for each file type. Disadvantage: two almost identical flows (I am not a big fan of redundancy)

  2. Use PowerShell first to get rid of checksums from files before running SSIS flow. Disadvantage: need to re-write plenty of files (performance hit)

  3. Ask on StackOverflow. Disadvantage: can be seen as ironic by StackOverflow community.

Any hints?


Solution

  • Here's an approach.

    1. Create 2 Flat File Connection Managers, and in the design mode, browse to a sample file for each so you can setup the columns etc.
    2. For the Flat File Connection Manager that doesn't have checksum rows, leave the Header Rows to Skip as 0, and for the Flat File Connection Manager that has the Checksum rows, set it to skip the first 2 rows as shown below. enter image description here
    3. Set the DelayValidation property for both connection managers to true.
    4. In your Data Flow Task, use a Scripting Component To Detect if File has Checksum Rows.
    5. In the output of the Scripting Component, Have the full path to the file as an output column and also have a Flag that indicates whether or not the File has Checksum rows. Ex: HasCheckSumRows
    6. Connect the output of the Scripting Component to a Conditional Split Task.
    7. In the Conditional Split Task, split based on HasCheckSumRows Flag set to true, and create 2 outputs, 1 for Files With Checksum Rows, other for Files Without.
    8. Create 2 Flat File sources, 1 for the checksum row files, that uses the flat file connection manager you created for these type of files, and the other for the regular files without checksum rows.
    9. Connect the 2 outputs from the conditional split to the respective flat file source, using the full file path that is in the output column as the connectionstring for the flat file source.
    10. Finally, use a Union All component to get the rows from both types of files back into 1 input data stream. So this doesn't duplicate your workflow (i.e. won't cause redundancy) as you have all the rows in 1 stream.

    Hope this makes sense.