Search code examples
azureazure-sql-databaseazure-data-factoryazure-synapse-pipeline

How to read a table that is in one database into a query that is running on different database in Synapse pipeline?


There are 2 databases, say Database A and Database B. I got a query that is running on Database A, now I need to join to a table (~20 rows) that resides in Database B and filter the result based on values coming from the table that is in Database B.

Example: Table in Database B, say table name is 'city'

ID City
1 Dallas
2 Houston
3 Tampa

Query Running on Database A:

select * from DatabaseA.city a inner join DatabaseB.city b on a.[ID] = b.[ID] where b.ID = 3 and b.City = 'Tampa'

How do I replicate this scenario in Synapse pipeline?

Approach 1: I have tried using a Lookup activity to read the table from Database B and in the query that is running on Database A I have created a temp table and attempted to insert the data from the Lookup into the temp table but was unsuccessful as I can only select using an integer index.

Approach 2: I have tried using Lookup activity to read the table from Database B and use STRING_AGG function to form comma separated list of values for each column from the table and within the query that will be run on Database A, I have tried creating a temp table and attempted to insert the values by using STRING_SPLIT and CROSS_APPLY but I couldn't get that working either, ended up in cartesian join result.

Is there a better approach to implement this scenario?

Note: This is scenario is simple to implement in a Data Flow using JOIN but in my environment we are only using synapse pipelines to move the data.

Any advice is much appreciated.

Thanks.


Solution

  • To replicate the query in ADF, follow the below approach.

    First take the table from database B(table with less than 5000 rows) in a lookup activity with the below query.

    select * from cityB where ID=3 and City='Tampa'
    

    enter image description here

    From my sample data, upon debugging till this activity, it will give a JSON of type Array of objects like below.

    [
            {
                "ID": 3,
                "City": "Tampa",
                "fname": "Virat"
            },
            {
                "ID": 3,
                "City": "Tampa",
                "fname": "Rakesh"
            },
            {
                "ID": 3,
                "City": "Tampa",
                "fname": "Starc"
            }
    ]
    

    Take copy activity and database A table as source for it. Use the above lookup activity output in the database A table query like below.

    DECLARE @json1 NVARCHAR(MAX) = '@{activity('Lookup on B').output.value}';
    
    select * from cityA a  inner join (select * from openjson(@json1,'$') with(
        ID int,
        City varchar(32),
        fname varchar(32)
    )) b on a.[ID]=b.[ID];
    

    enter image description here

    Give your target table in the sink. The above query produces the duplicate columns ID. Copy activity will give the error for duplicate input columns. So, remove the extra column in the copy activity mapping.

    First Go to Mapping -> import schema. Now it will ask the lookup output array.

    enter image description here

    Give the above array in it and click on ok. It will show all input column mappings. In that remove the extra ID column and map the target mapping as per your requirement.

    enter image description here

    Now, debug the entire pipeline and it will copy the expected data into the target table.

    enter image description here