Search code examples
sqloptimizationsnowflake-cloud-data-platformquery-optimizationcommon-table-expression

optimize a SQL query flow


I'm looking to see what is the most efficient way of writing SQL in snowflake. Below is the problem statement.

  1. Table A with events data, which has 1 column X. Table will have duplicate X values.
  2. Table B with Mapping data, which has 2 columns X1, T
  3. Custom function "function1" which makes API calls and generates T column value.

Ideally if all mapping values available in mapping table B, just a left join between TABLE A AND TABLE B would be sufficient like below

with cte1 as ( Select X, T from TABLE A 
LEFT JOIN TABLE B on A.X = B.X1);

But in some cases Mapping table B doesn't have new X column mapping yet so i need to apply API call over distinct column X of TABLE A which doesn't have mapping values in mapping table B, so minimal API calls can be made.

missing_values_table as (Select function1(distinct X) from cte1 where T is NULL);

After above step merge the result with MAPPING TABLE B to update with new records and finally do LEFT JOIN between TABLE A and Table B again to get the output table.

TABLE A

Column X
ab
zx
ab
cb
cc
cc

Table B

Column X1 Column T
ab tom
zx banks

output tables:

Column X Column T
ab tom
zx banks
ab tom
cb silver
cc gold
cc gold

Table B

Column X1 Column T
ab tom
zx banks
cb silver
cc gold

    with enriched_data as (
  SELECT X,
         MIN(B.T) AS T,  -- Use B.T for existing mapping
         CASE WHEN MIN(B.T) IS NULL THEN function1(MIN(X)) END AS new_T
  FROM TABLE A 
  LEFT JOIN TABLE B  ON A.X = B.X1  -- Join to check for existing mapping
  GROUP BY A.X
)
-- Update TABLE B with newly generated values
MERGE INTO TABLE B AS target
USING (
  SELECT X, new_T FROM enriched_data WHERE new_T IS NOT NULL
) AS source
ON target.X1 = source.X
WHEN MATCHED THEN UPDATE SET target.T = source.T;  -- Update existing T

-- Final join with updated TABLE B
SELECT A.X, COALESCE(B.T, new_T) AS T1
FROM TABLE A 
LEFT JOIN enriched_data B ON A.X = B.X;

I tried above approach, not sure if this is efficient. Please let me know if there is better way so that i can avoid two LEFT JOINS, seems repetitive


Solution

  • Why you're not use SQL Set Difference Operation, to get the difference between Table A.x and Table B.x1

    Query in MySQL:

    select distinct A.x from A
    where A.x NOT IN (Select distinct B.x1 from B)
    

    Result: db<>fiddle