Search code examples
sqloracleduplicatesrecords

Find duplicate columns that have values that end with same last four digits - SQL


I need a query that will search for all records in a table and return only the ones that have 2 or more values that end with same 4 digits, but i dont have specific four digits i can give you, i just need it to find everything that has more than one value that ends the same. eg. table has:

person_number------------name
==============================
1234567------------------john 
00004567-----------------mark
9999999------------------mike
0009999------------------lidia
10101010-----------------alex
23232323-----------------matt
98748484-----------------steve

I want it to return only:

person_number------------name
==============================
1234567------------------john 
00004567-----------------mark
9999999------------------mike
0009999------------------lidia

You see, i want it to return duplicates that end with the same last 4 digits, but i dont have specific digits to give you so i cant use something like '%9999'. It also needs to be sorted properly so I have them next to each other. I need to find duplicates and delete one of them, but i dont know which one until i actually see all the other values in other columns and pick which one to delete manually.

Thanks!


Solution

  • I just want extend Sick answer.

    You say you will like to choose which one to eliminate. But you can also include a ORDER clausule with a CASE statment to filter the one you want eliminate.

    In this case I order by "name" so you can delete all with rn > 1 and keep the first name.

    SqlFiddleDemo

    select "person_number", "name", rn, zero_count
    from
    (
      select "person_number", 
             "name", 
             substr("person_number", 1, 1),
             count(1) over (partition by substr("person_number",-4)) as Cnt,
             SUM(case 
                 when substr("person_number", 1, 1)  = '0' then 1
                 else 0 end) over (partition by substr("person_number",-4)) as zero_count,
              row_number() over (partition by substr("person_number",-4) order by "name") as rn
      from person
    )
    Where Cnt > 1
    and zero_count > 0
    ORDER BY substr("person_number",-4)
    

    I increase the data sample

    • now include a field zero_count to calculate how many rows start with 0 in each group
    • case where both row have same 4 char at the end and also start with 0 (ZERO_COUNT = 2)
    • case when a row without match also start with 0