Search code examples
arrayshadoophivehiveqlparquet

create nested data from from un-nested data in hive


Is there any way that I can create nested data in hive from un-nested data

Example: source table

id zip_code
123 12345
123 23456
123 56789
234 12345
234 99999

to look like this

id zipcode
123 12345,23456,56789
234 12345,99999

Do note that the number of zipcode's for an id can be varying


Solution

  • select id
         , concat_ws(',',collect_set(zip_code)) as zipcode 
      from your_table 
     group by id
    

    collect_set() will remove duplicates in zip_code collection. If you need duplicates, use collect_list instead