My dummy data:
| adv_id | dpa_id | json |
| -------| -------|----------------------------------------------|
| 1 | 21 |{'adv_name':'Tesla', 'product_name':'Model X'}|
| 1 | 21 |{'adv_name':'Tesla', 'product_name':'Model Y'}|
| 2 | 22 |{'adv_name':'BMW', 'product_name':'X5'} |
The problem is our hive system can only retrive 1 million rows of data at most. I need to bind the json together into a bigger one for every adv_id, dpa_id combo. Below is the output that I want:
adv_id | dpa_id | json |
---|---|---|
1 | 21 | [{'adv_name':'Tesla', 'product_name':'Model X'}, {'adv_name':'Tesla', 'product_name':'Model Y'}] |
2 | 22 | [{'adv_name':'BMW', 'product_name':'X5'}] |
How should bind the values together? Thanks in advance.
Danny
You can do a group by and collect_list
:
select adv_id, dpa_id, collect_list(json) as json
from table
group by adv_id, dpa_id;