Search code examples
sql-serverssissql-server-data-toolsssdt-bi

Add data from other object within SSIS package to populate a field for a table


There are many aspects of what I want to do but I think learning one piece will let me derive the rest.

I have an SSIS package that uses powershell to download a publicly available zip file, an execute script to unzip with 7zip and then data flows to load the unzipped files to corresponding tables.

What I want to do is add the file name (and eventually other aspects of the file like creation date, record counts and so on) from any one of the unzipped files to a log table that keeps track of the summary level details of the files.

How do I dynamically store this type of information as part of the package? Derived columns? But what's the input? Thanks!


Solution

  • There are many options for dynamically working with files through SSIS. Below is an overview of one method. Of course this can vary, depending on your specific needs and requirements.

    • Add a Foreach Loop Container. On the Collection pane, the Folder property can either be set using the GUI as well as through a parameter or variable with the Directory expression. Searching sub folders can also be set by checking the "Traverse subfolders" checkbox or using the Recurse expression like the Folder field.
    • The Files field will indicate the files to use and wildcards can be used. * will match any number of characters. For example, *.csv will get all csv files regardless of name and Test*.txt will return all .txt files with names beginning Test, regardless of how many or which characters follow. To limit this to a single character, use ?. The FileSpec expression will allow this to be set dynamically similar to the directory by variable or parameter.
    • The Variable Mappings pane will allow for setting a variable to hold a file name from the directory. Add a variable that will hold the file name to index 0 to map these.
    • You indicated that you wanted to store the file name. The detail of this can be controlled from the "Retrieve file name" field on the Collection window. As their names imply, Fully Qualified will hold the complete file path, Name and Extension will return the file name with extension, and Name Only is just the file name.
    • As for other aspects of the file, I'd recommend a using a Script Task for this for more functionality. The C# FileInfo class provides options for finding details about the file such as the creation date, last time the file was accessed, and when the file was most recently written to. Additonal information on this can be found here.
    • For the record counts from the file, you'll need to create a Connection Manager for this and work with the data within the package. I'm assuming these are flat files? If so, creating a Flat File Connection Manager, and setting the same variable from the Variable Mappings pane of the Foreach Loop to the ConnectionString expression of the Connection Manager will allow you to dynamically loop through each file. Make sure that the Fully Qualified option is used for the "Retrieve file name" field as earlier if you decide to do this. You will also want to configure the correct columns and data types for the Connection Manager ahead of time. This same process can be followed for Excel files, however the variable with the file name will be used on the ExcelFilePath expression instead.
    • As for storing information about a file in a log table, there are a multitude of options for these. A very basic example of an Insert statement within an Execute SQL Task that's placed within the Foreach Loop is below. The 3 part table name is only necessary if you're using a table that differs from the initial catalog of the Connection Manager. The ? is the parameter marker (assuming this is an OLE DB connection). After this, map the same variable/parameter that stores the file name using the Parameter Mapping pane. Set the direction to Input, appropriate data type (likely VARCHAR/NVARCHAR), 0 in the Parameter Name field to indicate this is the first parameter in the SQL statement (additional ? can be used for subsequent parameters in the SQL statement, then increment this field in accordance), and the default Parameter Size can be left at -1. Again, this is a simple example and you'll probably want store more about the files and their contents, but this can get you started.

    Sample SQL Insert:

    INSERT INTO YourDataBase.YourSchema.YourTable (ColumnToHoldFileName)
    VALUES (?)