Search code examples
joinssislookupdecodingtranscoding

SSIS optimizing "value decoding" on ETL data flow


I'm developing a series of ETL procedures for a migration project using SSIS/DataTool 2010. I have multiple cases where I have to decode source values with new target ones, i.e in Customers addresses I have the states with "full names" and I have to extract them with state iso code (i.e. from California to CA). to do this I have saved in the solution folders an excel file with a series of sheets with decoding info (source value and destination value), so in case I need to add or modify decoding values I just have to manipulate the file without touching the query or ETL. At now the best way I found to manage this in SSIS is:

  1. sort the stream with the main extraction (i.e DB source with the query extracting the customers and their addreses) on the field with the data to be decoded
  2. sort the stream pointing at the excel files that manages the decoding mapping, by the column with values to be transcoded
  3. merge join (outer) the 2 stream on the fields to be decoded so to have starting values and decoded values in the same stream
  4. handle a "derived column" step to apply the logic: "replace the starting value column with the one with decoded values, unless the starting value has no target value"

this way works but it's very intricate and hard to "replicate" (copied and adapted) for the other (many) decoding needed to handle), can anyone suggest a better and quicker way to handle the scenario I described? thanks in advance,

enter image description here

all details provided in the description


Solution

  • Cached Connection Manager + Lookup task.

    It eliminates the required, and slow, Sort transformations as well as the merge join.

    From a reusability perspective, if you have many coding decoding tasks, you could have the "Prime Cache Connection Manager" data flow as part of your pattern and then copy/paste it into all the packages that need the logic. Any time you need to look up your reference data (and assuming there's no concerns over case insensitive matches or the need to get more than 1 result), you get to use the synchronous Lookup component instead of 3 extra components.

    https://billfellows.blogspot.com/2011/11/using-excel-in-ssis-lookup.html