sql optimization snowflake-cloud-data-platform query-optimization common-table-expression

optimize a SQL query flow

I'm looking to see what is the most efficient way of writing SQL in snowflake. Below is the problem statement.

Table A with events data, which has 1 column X. Table will have duplicate X values.
Table B with Mapping data, which has 2 columns X1, T
Custom function "function1" which makes API calls and generates T column value.

Ideally if all mapping values available in mapping table B, just a left join between TABLE A AND TABLE B would be sufficient like below

with cte1 as ( Select X, T from TABLE A 
LEFT JOIN TABLE B on A.X = B.X1);

But in some cases Mapping table B doesn't have new X column mapping yet so i need to apply API call over distinct column X of TABLE A which doesn't have mapping values in mapping table B, so minimal API calls can be made.

missing_values_table as (Select function1(distinct X) from cte1 where T is NULL);

After above step merge the result with MAPPING TABLE B to update with new records and finally do LEFT JOIN between TABLE A and Table B again to get the output table.

TABLE A

Column X
ab
zx
ab
cb
cc
cc

Table B

Column X1	Column T
ab	tom
zx	banks

output tables:

Column X	Column T
ab	tom
zx	banks
ab	tom
cb	silver
cc	gold
cc	gold

Table B

Column X1	Column T
ab	tom
zx	banks
cb	silver
cc	gold


    with enriched_data as (
  SELECT X,
         MIN(B.T) AS T,  -- Use B.T for existing mapping
         CASE WHEN MIN(B.T) IS NULL THEN function1(MIN(X)) END AS new_T
  FROM TABLE A 
  LEFT JOIN TABLE B  ON A.X = B.X1  -- Join to check for existing mapping
  GROUP BY A.X
)
-- Update TABLE B with newly generated values
MERGE INTO TABLE B AS target
USING (
  SELECT X, new_T FROM enriched_data WHERE new_T IS NOT NULL
) AS source
ON target.X1 = source.X
WHEN MATCHED THEN UPDATE SET target.T = source.T;  -- Update existing T

-- Final join with updated TABLE B
SELECT A.X, COALESCE(B.T, new_T) AS T1
FROM TABLE A 
LEFT JOIN enriched_data B ON A.X = B.X;

I tried above approach, not sure if this is efficient. Please let me know if there is better way so that i can avoid two LEFT JOINS, seems repetitive

Solution

Why you're not use SQL Set Difference Operation, to get the difference between Table A.x and Table B.x1

Query in MySQL:

select distinct A.x from A
where A.x NOT IN (Select distinct B.x1 from B)

Result: db<>fiddle