Search code examples
sqlarrayshivehiveqlimpala

Aggregations in complex structure in hive


I have created a table in hive with complex structure in one column. example records:

+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| id        |                                                                                                                                                               order                                                                                                                                                                |  |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
|         1 | [{"books":"prince","order_timestamp":"2022-01-19 00:45:22","check_out_timestamp":"2022-01-19 00:45:22"},{"books":"venceremos","order_timestamp":"2022-01-19 00:47:13","check_out_timestamp":null},{"books":"rich dad poor dad","order_timestamp":null,"check_out_timestamp":"2022-01-19 00:47:13"}]                                |  |
|         2 | [{"books":"lord of flies","order_timestamp":"2022-01-11 12:47:14","check_out_timestamp":"2022-01-11 13:08:20"},{"books":"test","order_timestamp":"2022-01-11 12:47:14","check_out_timestamp":"2022-01-11 12:47:14"},{"books":"physics","order_timestamp":"2022-01-11 12:47:14","check_out_timestamp":"2022-01-11 12:47:14"}]       |  |
|         3 | [{"books":"test","order_timestamp":"2022-01-14 18:21:03","check_out_timestamp":"2022-01-14 18:21:03"},{"books":"up and down","order_timestamp":"2022-01-14 18:23:21","check_out_timestamp":"2022-01-14 18:23:21.018"},{"books":"mathematics","order_timestamp":"2022-01-14 18:23:21","check_out_timestamp":"2022-01-14 18:23:21"}] |  |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+

I want to ask, how can I perform some aggregations in this column? For example how many books ordered for 14/1/22 or the top selling book?


Solution

  • Explode array of struct using lateral view inline and calculate. For example, how many books ordered per date:

    select date(e.order_timestamp) order_date 
           count(*)  as book_cnt
     from table_name
          lateral view inline(order) e as books, order_timestamp, check_out_timestamp
    where date(e.order_timestamp) = date('2022-01-14')
    group by date(e.order_timestamp)