I have two tables where I want to compare multiple columns in the tables. Then, I want to output a compare result column for each column comparison.
Table 1 (t1)
id | gender | age | state |
---|---|---|---|
1 | M | 15 | CA |
2 | F | 20 | NY |
Table 2 (t2)
id | gender | age |
---|---|---|
1 | M | 15 |
2 | F | 21 |
Resulttable
id | t1.gender | t2.gender | gender_compare | t1.age | t2.age | age_compare |
---|---|---|---|---|---|---|
1 | M | M | true | 15 | 15 | true |
2 | F | F | true | 20 | 21 | false |
I am able to create a select statement to return the compared columns from each table. The part I'm uncertain about is how to create a compare column for each comparison (t1.age vs. t2. age and t1.gender vs. t2.gender).
My guess is I could write a case statement for each compare column (i.e. gender_compare, age_compare, etc). But, I have close to 30 columns to compare. This can become very messy to write 30 case statements. Is there a better way?
SELECT t1.id ,t2.id,
t1.gender, t2.gender
t1.age, t2 age
FROM table1 t1
join table2 t2 on t1.id=t2.id
For reference, I am running the query in AWS Athena.
This can become very messy to write 30 case statements. Is there a better way?
You don't need the case statements, just compare the columns (i.e. t1.gender = t2.gender as gender_compare
, etc.):
SELECT t1.id id
t1.gender, t2.gender, t1.gender = t2.gender as gender_compare
t1.age, t2.age, t1.age = t2.age as age_compare
FROM table1 t1
join table2 t2 on t1.id = t2.id
If you need some other output instead of true/false - use if
function which is more succinct approach then using case
:
The
IF
function is actually a language construct that is equivalent to the followingCASE
expression:CASE WHEN condition THEN true_value [ ELSE false_value ] END
For example - if(t1.gender = t2.gender, 'Y', 'N') as gender_compare
I can't think of better options then listing all needed columns and comparison. Potentially you can query information_schema.columns
(see this answer) to "automate" manual query generation (i.e. you write the SQL query which produces another SQL query as a result - have done this myself several times)
AFAIK Presto/Trino does not provide options for dynamic SQL generation and execution. If you want to fully automate this then you will need to use some outside scripting (for example generating and executing query via AWS Lambda).