Search code examples
sqloracle-databaseaggregate-functions

How to select column values of a group based on another column value


I got a table like this one

CaseID NAME ADDRESS ZIP ROLE
1 Joe address_1 zip_1 role_1
1 John address_2 zip_2 role_1
1 Jane address_3 zip_3 role_1
1 Bill address_4 zip_4 role_1
1 Bill address_5 zip_5 role_2
2 Bob address_6 zip_6 role_1
2 Shawn address_7 zip_7 role_1

I would like to group by the name and CaseID, making a list of the roles in each group. That part is easy. The tricky part is that as you can see for Bill, we have two different addresses and zip. I tried to keep only one with a Max or Min aggregation function inside the group, but there might be inconsistency in the resulting address, keeping zip of one row and the address of the other raw. How can I fetch the zip and address of the same row (which ever) in a group and listing all the roles. I'd like a result like

CaseID NAME ADDRESS ZIP ROLE
1 Joe address_1 zip_1 role_1
1 John address_2 zip_2 role_1
1 Jane address_3 zip_3 role_1
1 Bill address_4 zip_4 role_1, role_2
2 Bob address_6 zip_6 role_1
2 Shawn address_7 zip_7 role_1

or

CaseID NAME ADDRESS ZIP ROLE
1 Joe address_1 zip_1 role_1
1 John address_2 zip_2 role_1
1 Jane address_3 zip_3 role_1
1 Bill address_5 zip_5 role_1, role_2
2 Bob address_6 zip_6 role_1
2 Shawn address_7 zip_7 role_1

Solution

  • In Oracle that's a good use case for the keep syntax to aggregate functions:

    select caseid, name, 
        min(address) keep(dense_rank first order by address) address,
        min(zip)     keep(dense_rank first order by address) zip,
        listagg(role, ', ') within group (order by role) role
    from mytable
    group by caseid, name
    

    The order by clause of keep lets you consistently "pick" a row in the group - that whose address comes first when sorted against others in the group ; we can repeat the expression across columns to get the zip of the same row.

    Note that the query would benefit a more stable ordering criteria (such as order by address_id maybe?). Else, from which row should the zip be kept when there are duplicate addresses?

    Demo on DB Fiddle