Search code examples
sqljoingoogle-bigqueryself-join

Having difficulty generating distinct rows without duplicates in a SELF JOIN (SQL)


I'm pretty new to SQL and thought I'd be able to use it to create a list for my employer's clients. Unfortunately, there the majority of clients have more than one account and the file has a distinct row for each account.

I was trying to use a self join to create one row for each client with multiple columns for the accounts.

SELECT DISTINCT A.Account_Number AS Account_1, B.Account_Number AS Account_2, A.Client_Name
FROM client_table AS A, client_table AS B
WHERE A.Account_Number <> B.Account_Number
AND A.Client_Name = B.Client_Name
ORDER BY A.Client_Name;

Unfortunately, the results were such that I would get a table that looks like:

Account_1 Account_2 Client_name
000001 000002 Joe Shmo
000001 000003 Joe Shmo
000002 000003 Joe Shmo
000002 000001 Joe Shmo

I understand that for more than two accounts, I'll need more than two joins, but I haven't figured out how to do it.

Is there a way to prevent double entry?

I'm using BigQuery btw.


Solution

  • In the first step number the accounts per client. Then use conditional aggregation to get one column per account.

    select 
      client_name,
      max(case when rn = 1 then account_number end) as account1,
      max(case when rn = 2 then account_number end) as account2,
      max(case when rn = 3 then account_number end) as account3,
      max(case when rn = 4 then account_number end) as account4,
      max(case when rn = 5 then account_number end) as account5,
      max(case when rn = 6 then account_number end) as account6
    from
    (
      select
        client_name,
        account_number,
        row_number() over (partition by client_name order by account_number) as rn
      from client_table
    ) numbered
    group by client_name
    order by client_name;