I need to take non-normalized data and create normalized tables.
I can't figure out how to get the detail ID to insert into the join table (StoreAddress.AddressID in the example below).
If I were doing this as a stored proc in T-SQL within a loop I would use @@IDENTITY after inserting the row in Address to get the key and then use it to insert into StoreAddress. I can't figure out how to do using transforms in SSIS 2012.
Using Store and Address as an example.
The input data looks like:
Store, Type, Address1, City, State
1, P, 123 Main, Central, PA
1, M, 123 Second, Central, PA
2, P, 123 Third, Pokono, NY
2, M, 123 Third, Pokono, NY
The destination tables are: Store (already populated in different data flow)
StoreID, StoreName, StoreNumber
9878, Main street, 1
561, Mountain View, 2
AddressType (already populated in different data flow)
AddressTypeID, Code, Description
1, P, Physical
2, M, Mailing
3, O, Other
Address
AddressID, Addr1, City, State
721, 123 Main, Central, PA
843, 123 Second, Central, PA
1098, 123 Third, Pokono, NY
StoreAddress
StoreID, AddressID, AddressTypeID
9878, 721, 1
9878, 843, 2
561, 1098, 1
561, 1098, 2
I think this should be a fairly common transformation and there is a best practice to accomplish it in SSIS.
Thanks for thinking of my question!
Tim
Start by inserting the different Addresses:
INSERT dbo.Address (Addr1, City, State)
SELECT DISTINCT Address1, City, State
FROM input;
(Maybe have WHERE NOT EXISTS if you have values already in there)
Then use lookups to fetch the values for your StoreAddress table.
INSERT dbo.StoreAddress (StoreId, AddressId, AddressTypeID)
SELECT
(SELECT s.StoreId from dbo.Store AS s
WHERE s.StoreNumber = i.Store)
, (SELECT a.AddressId FROM dbo.Address AS a
WHERE a.Addr1 = i.Address1
AND a.City = i.City
AND a.State = i.State)
, (SELECT at.AddressTypeId
FROM dbo.AddressType AS at
WHERE at.Code = i.Type)
FROM input AS i;
Using sub-queries like this is just like using joins, but safer because you won't impact the number of rows in input
.
In pure SSIS, do a Data Flow Task to get your address table sorted, using a Sort on the address columns only, with distinct turned on. Then you can do another Data Flow Task with three Lookup transformations to get the IDs like in the query that I wrote above.