Search code examples
sqlimpala

Multiplied rows in impala


I am fetching some data from a view with some joined tables through sqoop into an external table in impala. However I saw that the columns from one table multiply the rows. For example

id first_name surname step name       value
1  ted         kast    1   museum     visitor
1  ted         kast    1   shop       buyer
1  ted         kast    2   museum     visitor
1  ted         kast    2   shop       buyer

But I want to be something like that

id first_name surname step name_value
1  ted        kast    1    [(museum visitor), (shop buyer)]
1  ted        kast    2    [(museum visitor), (shop buyer)]

How can I achieve that in impala?


Solution

  • We can use aggregation here along with GROUP_CONCAT:

    SELECT
        id,
        first_name,
        surname,
        step,
        CONCAT('[', GROUP_CONCAT(CONCAT('(', CONCAT_WS(' ', name, value), ')'), ', '), ']') AS name_value
    FROM yourTable
    GROUP BY
        id,
        first_name,
        surname,
        step
    ORDER BY id;
    

    Here is a demo for MySQL, where the syntax is almost the same as for Impala.