Search code examples
sqlhivehiveql

bind column values by group using hiveql


My dummy data:

| adv_id | dpa_id | json                                         |
| -------| -------|----------------------------------------------|
| 1      | 21     |{'adv_name':'Tesla', 'product_name':'Model X'}|
| 1      | 21     |{'adv_name':'Tesla', 'product_name':'Model Y'}|
| 2      | 22     |{'adv_name':'BMW', 'product_name':'X5'}       |

The problem is our hive system can only retrive 1 million rows of data at most. I need to bind the json together into a bigger one for every adv_id, dpa_id combo. Below is the output that I want:

adv_id dpa_id json
1 21 [{'adv_name':'Tesla', 'product_name':'Model X'}, {'adv_name':'Tesla', 'product_name':'Model Y'}]
2 22 [{'adv_name':'BMW', 'product_name':'X5'}]

How should bind the values together? Thanks in advance.

Danny


Solution

  • You can do a group by and collect_list:

    select adv_id, dpa_id, collect_list(json) as json
    from table
    group by adv_id, dpa_id;