Search code examples
comparisonteradatateradata-sql-assistant

TERADATA Compare tables for matches


I have a question in relation to comparison between tables.

I want to compare data from the same table with different filter conditions:

First Version:

select * 
from PPT_TIER4_FTSRB.AUTO_SOURCE_ACCOUNT 
WHERE BUSINESS_DATE = DATE '2022-05-31' 
  AND GRAIN ='ACCOUNT'
  AND LAYER = 'IDL'
  AND SOURCE_CD = 'MTMB'

Second Version:

select * 
from PPT_TIER4_FTSRB.AUTO_SOURCE_ACCOUNT 
WHERE BUSINESS_DATE = DATE '2022-05-31'
  AND GRAIN ='ACCOUNT'
  AND LAYER = 'ACQ'
  AND SOURCE_CD = 'MTMB'

As you can see the only difference between the two is the LAYER = IDL in first version and ACQ

I wanted to see which records match betweeen the two excluding the column Layer(Because they would always be different.

I tried to do an inner join, but it keeps running for very long:

SELECT * 
FROM 
 ( select * 
   from PPT_TIER4_FTSRB.AUTO_SOURCE_ACCOUNT
   WHERE BUSINESS_DATE = DATE '2022-05-31'
     AND GRAIN ='ACCOUNT'
     AND LAYER = 'IDL'
     AND SOURCE_CD = 'MTMB'
 ) A
INNER JOIN
 ( select *
   from PPT_TIER4_FTSRB.AUTO_SOURCE_ACCOUNT
   WHERE BUSINESS_DATE = DATE '2022-05-31'
     AND GRAIN ='ACCOUNT'
     AND LAYER = 'ACQ'
     AND SOURCE_CD = 'MTMB'
 ) B
 ON A.BUSINESS_DATE = B.BUSINESS_DATE
AND A.GRAIN =B.GRAIN
AND A.SOURCE_CD = B.SOURCE_CD

Solution

  • This is because a join for your purposes would need a 1:1 relationship between the rows being joined. You don't appear to have that, and haven't given any example data for us to derive one.

    For example:

    • sample 1 has rows 1, 2, 3
    • sample 2 has rows a, b, c
    • your results give 1a,1b,1c,2a,2b,2c,3a,3b,3c

    That's effectively a CROSS JOIN, which happens because the columns you're joining on are always the same on every row.

    My advice would be to select all the rows in question and Sort them. Then visually see if there are any patterns you want to analyse with joins or aggregates...

    SELECT * 
    FROM ppt_tier4_ftsrb.auto_source_account 
    WHERE business_date = DATE '2022-05-31'
      AND grain ='ACCOUNT'
      AND layer IN ('ACQ', 'IDL')
      AND source_cd = 'MTMB'
    ORDER BY layer, and, some, other, columns