Search code examples
sqlstringgroup-byazure-sql-databasewindow-functions

SQL Group by but record relationship


Here is my question:

I have the following data that I am looking to de-dupe and generally clean, the fields are:

  • UniqueID
  • Name
  • PhoneNumber
  • Email

RDBMS is Azure SQL

What I'm wanting to do is to do a Group-By on Name, Phone Number and Email and then record the UniqueID in a separate column e.g.:

NewUniqueID Name PhoneNumber Email OldIDs
1 Sam Smith 091234567 [email protected] 12,54,65,70

However, I'm not sure what the correct terminology is to create the OldIDs field, so I've been unable to find out how to do it.


Solution

  • You seem to want string aggregation:

    select 
        row_number() over(order by (select null)) newUniqueID,
        name,
        phoneNumber,
        email,
        string_agg(uniqueID, ',') oldIDs
    from mytable
    group by name, phoneNumber, email
    

    This generates one row per (name, phoneNumber, email) tuple, and concatenate all uniqueIDs in column oldIDs. row_number() assigns a new id to each row in the resultset.