Search code examples
sqlhadoophivebigdatahiveql

UNION TWO SELECT ON HIVE


I have a database like that on HIVE.

+--------+------------------+---------+
| rating |    date_upd      | version |
+--------+------------------+---------+
| 3      | 2021-07-01 12:13 | 2.1.9   |
| 5      | 2021-07-01 10:39 | 2.2.6   |
| 4      | 2021-07-02 10:24 | 2.2.7   |
| 5      | 2021-07-02 05:37 | 3.2.4   |
| 1      | 2021-07-02 21:40 | 3.2.5   |

I need to send in another table the result of two SELECT. How can I merge these two into one with HiveQL?

SELECT substr('date_upd',1,10) as 'day',
       count(*) cnt 
FROM tbl_one 
GROUP BY
       substr(date_upd,1,10);


SELECT substr('date_upd',1,7) as 'month',
       count(*) cnt 
FROM table_name 
GROUP BY
      substr('date_upd',1,7);

When I do it this way, it only returns the 'day' value and not the 'month' value.

SELECT 
      substr('date_upd',1,7) as 'month',
      count(*) cnt_month,
      substr('date_upd',1,10) as 'day',
      count(*) cnt_day
FROM table_name 
GROUP BY
      substr('date_upd',1,7),
      substr('date_upd',1,10);

Solution

  • It depends on what merging means to you. If you are trying to append vertically (as it sounds from the title description) you could try this:

    Vertical append:

    UNION:

    If you want to print one report immediately after the other one just do a UNION statement

    SELECT substr('date_upd',1,10) as 'day',
           count(*) cnt 
    FROM tbl_one 
    GROUP BY
           substr(date_upd,1,10);
    
    UNION -- this is the operator you are looking for
    
    SELECT substr('date_upd',1,7) as 'month',
           count(*) cnt 
    FROM tbl_one 
    GROUP BY
          substr('date_upd',1,7);
    

    Horizontal append:

    JOIN

    However if you are trying to display things horizontally, you should probably go with a join based method (simpler than window functions) probably joining on month and bringing all the monht_date, and day_date columns (in case you want to actually put these columns and counts one next to another).

    day count_day month count_month
    2021-07-01 12:13 2 2021-07 45
    2021-07-02 11:07 5 2021-07 45
    2021-07-05 07:22 3 2021-07 45

    WINDOW FUNCTIONS:

    If you dont want to have repetition of columns you could directly aggregate over different levels, using window functions with partition levels. But this is clearly more advanced and probably not what requested. Just pointing directions.

    You would get something like:

    date count_day count_month
    2021-07-01 12:13 2 45
    2021-07-02 11:07 5 45
    2021-07-05 07:22 3 45