Search code examples
sqlhivehiveql

How to perform aggregation(sum) group by month field using HiveQL?


Below is my data where am looking to generate sum of revenues per month basis using columns event_time and price.

+--------------------------+----------------------+----------------------+-----------------------+-------------------------+-----------------+-----------------+-------------------+---------------------------------------+
|   oct_data.event_time    | oct_data.event_type  | oct_data.product_id  | oct_data.category_id  | oct_data.category_code  | oct_data.brand  | oct_data.price  | oct_data.user_id  |         oct_data.user_session         |
+--------------------------+----------------------+----------------------+-----------------------+-------------------------+-----------------+-----------------+-------------------+---------------------------------------+
| 2019-10-01 00:00:00 UTC  | cart                 | 5773203              | 1487580005134238553   |                         | runail          | 2.62            | 463240011         | 26dd6e6e-4dac-4778-8d2c-92e149dab885  |
| 2019-10-01 00:00:03 UTC  | cart                 | 5773353              | 1487580005134238553   |                         | runail          | 2.62            | 463240011         | 26dd6e6e-4dac-4778-8d2c-92e149dab885  |
| 2019-10-01 00:00:07 UTC  | cart                 | 5881589              | 2151191071051219817   |                         | lovely          | 13.48           | 429681830         | 49e8d843-adf3-428b-a2c3-fe8bc6a307c9  |
| 2019-10-01 00:00:07 UTC  | cart                 | 5723490              | 1487580005134238553   |                         | runail          | 2.62            | 463240011         | 26dd6e6e-4dac-4778-8d2c-92e149dab885  |
| 2019-10-01 00:00:15 UTC  | cart                 | 5881449              | 1487580013522845895   |                         | lovely          | 0.56            | 429681830         | 49e8d843-adf3-428b-a2c3-fe8bc6a307c9  |
| 2019-10-01 00:00:16 UTC  | cart                 | 5857269              | 1487580005134238553   |                         | runail          | 2.62            | 430174032         | 73dea1e7-664e-43f4-8b30-d32b9d5af04f  |
| 2019-10-01 00:00:19 UTC  | cart                 | 5739055              | 1487580008246412266   |                         | kapous          | 4.75            | 377667011         | 81326ac6-daa4-4f0a-b488-fd0956a78733  |
| 2019-10-01 00:00:24 UTC  | cart                 | 5825598              | 1487580009445982239   |                         |                 | 0.56            | 467916806         | 2f5b5546-b8cb-9ee7-7ecd-84276f8ef486  |
| 2019-10-01 00:00:25 UTC  | cart                 | 5698989              | 1487580006317032337   |                         |                 | 1.27            | 385985999         | d30965e8-1101-44ab-b45d-cc1bb9fae694  |
| 2019-10-01 00:00:26 UTC  | view                 | 5875317              | 2029082628195353599   |                         |                 | 1.59            | 474232307         | 445f2b74-5e4c-427e-b7fa-6e0a28b156fe  |
+--------------------------+----------------------+----------------------+-----------------------+-------------------------+-----------------+-----------------+-------------------+---------------------------------------+

I have used the below query but the sum does not seem to occur. Please suggest best approaches to generate the desired output.

select date_format(event_time,'MM') as Month, 
       sum(price) as Monthly_Revenue 
from oct_data_new 
group by date_format(event_time,'MM') 
order by Month;

Note: event_time field is in TIMESTAMP format.


Solution

  • First convert the timestamp to date and then apply date_format():

    select date_format(cast(event_time as date),'MM') as Month, 
           sum(price) as Monthly_Revenue 
    from oct_data_new 
    group by date_format(cast(event_time as date),'MM') 
    order by Month;
    

    This will work if all the dates are of the same year.
    If not then you should also group by year.