Search code examples
sql-serverssisdata-warehouse

Can SSIS import TXT records in exact same order as they are in TXT file? If not (by default), then how?


I have records in a flat file that have what you might consider to be master records with detail records following the master they relate to, until the master record.

Here is an example:

Order123, Customer1, 1/1/2018
Item1, $1
Item2, $1
Order124, Customer2, 1/1/2018
Item1, $1
Item4, $2

The file has no line numbers or any kind of sequencing built in, nor does it use foreign keys to relate master to detail.

If I were to use SSIS to import the raw TXT data into a flexible table with columns designed to take various datatypes (i.e. nvarchar(255), or similar), I could iterate over the values after the import, and relate the values in Line #2 and Line #3 with Order123; and consequently, Lines #5 and #6 with Order124.

The table holding the raw data will use a simple RecordID identity column with an integer incrementing by one.

It doesn't really matter, but if you're curious, the actual data I'm referring to is the Retrosheet data event files. It's a collection of all Major League Baseball data. A real file could be downloaded from a link on this page: https://www.retrosheet.org/game.htm

I seem to recall that you could not import TXT data into a table and expect that the order of the rows would match the order of the TXT lines. When I do small tests of this, however, the records do appear in the same order as the source file. I suspect that my small test results were too good to be true and not a fail safe prediction on how it will turn out.

In summary:

How do I use SSIS to import data, inserting SQL records in the same order as the original flat file?


Solution

  • I just typed this out so there might be a few errors in syntax but it should get you close.

    You will need to set up 2 different outputs.

    Order of load will not matter as you are adding a foreign key to the detail table.

    public string orderNo; /// on the OUTSIDE
    
    public main()
    
    
    
    string[] lines = System.IO.File.ReadAllLines([filename]);
    
    foreach(var line in lines)
    {
       string[] cols = line.Split(',');
       if(cols.Length == 3)
       {
          orderNo = cols[0];
          Output0Buffer.AddRow();
          Output0Buffer.OrderNo = cols[0].ToString();
          Output0Buffer.Customer = cols[1].ToString();
          Output0Buffer.OrderDate = DateTime.Parse(cols[2].ToString().Trim());
       }
       else
       {
          Output1Buffer.AddRow();
          Output1Buffer.OrderNo = orderNo;
          Output1Buffer.Item = cols[0].ToString();
          Output1Buffer.Amt = cols[1].ToString(); //This needs to be parsed later.
       }
    }
    

    FOLLOW UP:

    I just reviewed the site you are trying to download from. And the file is more complicated than you led on by your question.

    Split still seems safe to use but you will have to trim some quotewrapped strings (names) but it looks like there are no quote wrapped commas (at least in the examples). If that is the case you need to use REGEX to split.

    I would change the logic to use switch and case and base it on cols[0] being one of the 8 types.

    Save ID on the outside and write to each of the 7 other possible datasets this creates for linkage to parent. You will have to use the same strategy for other records that need to be tied to a different parent (I think comment is an example).

    GOOD LUCK with all this. Plays do not look easy to interpret!