Search code examples
sqlsql-servergroup-bypivot

Combine two rows into one based on identifier


My dataset looks like this:

LBID ADDRESS FI LI
101 7283 JOHN SMITH
101 7283 JANE JONES

Since John and Jane have the same LBID and the same Address, I want the output to look like this:

LBID ADDRESS FI LI FI2 LI2
101 7283 JOHN SMITH JANE JONES

I checked on Stack Overflow for similar issues but they all deal with columns that have NULLs. None of mine have NULLs.

I tried STUFF, STRING_AGG and JOIN operations, but none can get it how I want it. My latest attempt was:

SELECT LBID, ADDRESS, STRING_AGG(FI, '& ') AS FI, LI
FROM TABLENAME
GROUP BY LBID, ADDRESS, LI

but since the last names are not the same it still puts it on two different rows.

Any thought on how this can be done?


Solution

  • If you assume a fixed schema with two record duplicates at best, you can count your duplicate records with a ROW_NUMBER window function. Since it can assume value 2 at best (two duplicates), you can solve the problem with a typical pivot approach and extract the corresponding rownum for each row.

    WITH cte AS (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY LBID, ADDRESS ORDER BY FI) AS rn
        FROM tab
    )
    SELECT LBID, ADDRESS,
           MAX(CASE WHEN rn = 1 THEN FI END) AS FI1,
           MAX(CASE WHEN rn = 1 THEN LI END) AS LI1,
           MAX(CASE WHEN rn = 2 THEN FI END) AS FI2,
           MAX(CASE WHEN rn = 2 THEN LI END) AS LI2
    FROM cte
    GROUP BY LBID, ADDRESS
    

    Check the demo here.

    Output:

    LBID ADDRESS FI LI FI2 LI2
    101 7283 JOHN SMITH JANE JONES

    If you have more than 2 duplicate records, it's sufficient to add more pivotal elements to your final query, and extracting coupled "rn=", as per following pseudocode:

    WITH cte AS (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY LBID, ADDRESS ORDER BY FI) AS rn
        FROM tab
    )
    SELECT LBID, ADDRESS,
           MAX(CASE WHEN rn = 1 THEN FI END) AS FI1,
           MAX(CASE WHEN rn = 1 THEN LI END) AS LI1,
           MAX(CASE WHEN rn = 2 THEN FI END) AS FI2,
           MAX(CASE WHEN rn = 2 THEN LI END) AS LI2,
           ...
           ...
           MAX(CASE WHEN rn = <n> THEN FI END) AS FI<n>
           MAX(CASE WHEN rn = <n> THEN LI END) AS LI<n>
    FROM cte
    GROUP BY LBID, ADDRESS
    

    If you don't know how many duplicate records per (LBID, ADDRESS) you can have at most, you can run a query to know that value:

    SELECT TOP(1) COUNT(1) AS cnt
    FROM tab
    GROUP BY LBID, ADDRESS
    ORDER BY cnt DESC
    

    This value will be the amount of couple of pivotal elements to extract:

           MAX(CASE WHEN rn = <n> THEN FI END) AS FI<n>
           MAX(CASE WHEN rn = <n> THEN LI END) AS LI<n>
    

    for the specific case of your sample table, 2 (as per first query).


    And if have so many duplicate records per single address, and want a generalized approach, you would go towards dynamic querying. Yet that approach could be less performant and prone to SQL injection.