Search code examples
sqlleft-joingreatest-n-per-group

Join two tables and select a single record based on condition


I have 2 tables (Customers and Addresses). I need to join the two tables using customer_id column and output only one record per customer where the address is primary address, or if it is not, then I should chose the address where the address name starts with letter 'A'. If none is true then I need an empty cell. (Note that the customer may have multiple addresses and a few of them may start with 'A', and I should take only one.) I will try to visualize the task below, to be more clear.

Customers

customer_id first_name last_name
1 John Johnny
2 Mike Mikey
3 Tim Timson
4 Bart Bartson

Addresses

address_id customer_id address_line city isPrimary
1 1 "Address line" Toronto 0
2 1 "Another address" Chicago 1
3 2 "6th ave." Los Angel 0
4 2 "Just an address" Salt Lake 0
5 3 "Tim's street" Denver 1
6 3 "Another Tim's ad" Berlin 1
7 4 "Wallstreet" New York 0
8 4 "Skystreet" London 0
9 4 "St. Angel str." Rome 0
10 2 "Another Mike's" London 0

And the final output should be the following:

Output

customer_id first_name last_name address_line city
1 John Johnny "Another address" Chicago
2 Mike Mikey "Another Mike's" London
3 Tim Timson "Tim's street" Denver
4 Bart Bartson "" ""
  • Since John has two addresses starting with 'A', but there is a primary one, the primary one is chosen.
  • Mike has 3 addresses, none of them is primary, so chosing the one starting with 'A'.
  • Tim has two primary addresses, so we chose only the first encounter.
  • Bart has two addresses, but non is primary and none starts with 'A', so its left blank.

Solution

  • If I follow you correctly, we can approach this using row_number() for prioritization and a left join:

    select c.*, a.address_line, a.city
    from customers c
    left join (
        select a.*, 
            row_number() over(partition by customer_id order by isPrimary desc, address_id) rn
        from addresses a
        where isPrimary = 1 or address_line like 'A%'
    ) a on a.customer_id = c.customer_id and a.rn = 1
    

    The subquery on filters on primary addresses or addresses that start with A, and ranks them for each client, giving priority to primary ones. If a client has no primary adress and no address that starts with A, this returns an empty set; if he has several matching addresses, this prioritizes the primary (if any).

    All that is left to do then is to filter on the top match per group (if any) in the left join.