I got a table like this one
CaseID | NAME | ADDRESS | ZIP | ROLE |
---|---|---|---|---|
1 | Joe | address_1 | zip_1 | role_1 |
1 | John | address_2 | zip_2 | role_1 |
1 | Jane | address_3 | zip_3 | role_1 |
1 | Bill | address_4 | zip_4 | role_1 |
1 | Bill | address_5 | zip_5 | role_2 |
2 | Bob | address_6 | zip_6 | role_1 |
2 | Shawn | address_7 | zip_7 | role_1 |
I would like to group by the name and CaseID, making a list of the roles in each group. That part is easy. The tricky part is that as you can see for Bill, we have two different addresses and zip. I tried to keep only one with a Max or Min aggregation function inside the group, but there might be inconsistency in the resulting address, keeping zip of one row and the address of the other raw. How can I fetch the zip and address of the same row (which ever) in a group and listing all the roles. I'd like a result like
CaseID | NAME | ADDRESS | ZIP | ROLE |
---|---|---|---|---|
1 | Joe | address_1 | zip_1 | role_1 |
1 | John | address_2 | zip_2 | role_1 |
1 | Jane | address_3 | zip_3 | role_1 |
1 | Bill | address_4 | zip_4 | role_1, role_2 |
2 | Bob | address_6 | zip_6 | role_1 |
2 | Shawn | address_7 | zip_7 | role_1 |
or
CaseID | NAME | ADDRESS | ZIP | ROLE |
---|---|---|---|---|
1 | Joe | address_1 | zip_1 | role_1 |
1 | John | address_2 | zip_2 | role_1 |
1 | Jane | address_3 | zip_3 | role_1 |
1 | Bill | address_5 | zip_5 | role_1, role_2 |
2 | Bob | address_6 | zip_6 | role_1 |
2 | Shawn | address_7 | zip_7 | role_1 |
In Oracle that's a good use case for the keep
syntax to aggregate functions:
select caseid, name,
min(address) keep(dense_rank first order by address) address,
min(zip) keep(dense_rank first order by address) zip,
listagg(role, ', ') within group (order by role) role
from mytable
group by caseid, name
The order by
clause of keep
lets you consistently "pick" a row in the group - that whose address
comes first when sorted against others in the group ; we can repeat the expression across columns to get the zip
of the same row.
Note that the query would benefit a more stable ordering criteria (such as order by address_id
maybe?). Else, from which row should the zip be kept when there are duplicate addresses?