Search code examples
azure-databricksdatabricks-sql

Combine multiple rows that share the same ID


I'm trying to combine comment rows that have the same id in the sequence order separated by '/ ' The data I need is in different databases & tables. I've tried GROUP BY, UNION, and AGGERGATE with no success.

Here is example sql code:

SELECT
F.ID
F.FRUIT
C.SEQUNENCE
C.COMMENT

FROM FRUIT F
LEFT JOIN COMMENT C 
ON F.ID = C.ID

Current result:

ID FRUIT SEQUENCE COMMENT
1 APPLE 1 COMMENT1
1 APPLE 2 COMMENT2
1 APPLE 3 COMMENT3
2 BANANA 1 COMMENT1
2 BANANA 2 COMMENT2
3 KIWI 1 COMMENT1

Desired result:

ID FRUIT COMMENT
1 APPLE COMMENT1/ COMMENT2/ COMMENT3
2 BANANA COMMENT1/ COMMENT2
3 KIWI COMMENT1

Thanks for the help - J


Solution

  • The rough equivalent would be using collect_set and array_join but note you have lost the order:

    SELECT
    F.ID
    ,F.FRUIT
    ,array_join(array_sort(collect_set(str(sequence) + '-' + comment)), '/') 
    FROM FRUIT F
    LEFT JOIN COMMENT C 
    ON F.ID = C.ID
    GROUP BY F.ID, F.FRUIT
    

    test

    http://sqlfiddle.com/#!18/9757e/3

    STRING_AGG()

    https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-ver16