Search code examples
pythonsqlsql-serverpysparkapache-spark-sql

querying for duplicates in 2 tables SQL server


I have a table table1 containing only the duplicate person_number and its corresponding account_number. How do I query for these duplicate person_number from main table master_table which has no person_number column but only the account_number column?

table1
person_number account_number
11 1
12 2
13 3
14 4
15 5
master_table
account_number account_balance
1 238989
2 4889895
3 68897
4 298729
5 387745
6 354655
7 553455
8 35545

I would like to filter the master_table to only contain the account_number present in table1

resultant_table_of_duplicates
account_number account_balance
1 238989
2 4889895
3 68897
4 298729
5 387745

Solution

  • As already pointed in the comments section, one idea to carry out this task is using the EXISTS operator inside the WHERE clause, by checking all records from mastertable whose "account_number" can be found inside table1.

    SELECT * 
    FROM mastertable m
    WHERE EXISTS(SELECT 1 
                 FROM table1 t 
                 WHERE m.account_number = t.account_number)
    

    Output:

    account_number account_balance
    1 238989
    2 4889895
    3 68897
    4 298729
    5 387745

    Check the demo here.