Search code examples
azureazure-data-factory

Azure Data Factory importing: many to many relationship


I'm trying to import a .csv file into a Microsoft SQLServer database using Azure Data Factory. The .csv file headers look like this

VehicleVIN, VehicleMake, VehicleModel, CustomerName, CustomerAddress

I've sinked the VehicleVIN, VehicleMake and VehicleModel into a VehicleTable. I've sinked the CustomerRef, CustomerName and CustomerAddress into a CustomerTable.

Since this is a many to many relationship between those 2 entities, I need to add a third table named VehicleCustomerTable (Id, VehicleId, CustomerId) where I should save the id's of the entities that I've sinked into VehicleTable and CustomerTable.

The problem is that I have no idea of how to associate them in the same data flow. Do I need another data flow for this one? If a separate data flow is needed, how do I know what customer belongs to what vehicle? I've already built a data flow that sinks the data to the corresponding tables but the problem is the association of these 2 entities.

Azure Data Flow


Solution

  • Below is the input data.

    enter image description here

    Here, I added new column CustomerRef using surrogatekey since you not given in input. Use customerid if you have in your column.

    Next, I have done group by using Aggregate component on VehicleVIN and CustomerRef. Then for aggregate i just used tmp as column and max(CustomerRef) as value, later I unselected this column in select component. This done just to avoid error.

    Below are the settings.

    Under Groub by

    enter image description here

    Under Aggregates enter image description here

    Next, add surrogatekey component for creating Id column.

    settings: enter image description here

    Preview:

    enter image description here

    Then unselect the tmp column using select component as below.

    enter image description here

    Final data preview you will get.

    enter image description here

    At the end sink this data to sql table named VehicleCustomerTable.