Search code examples
sqlhive

Hive Multi Select


CID    F_ID       NME
1       A          QR
1       B          QB
2       A          QR
3       B          QB
4       A          QR
4       B          QB

Result: -

CID F_ID    NME
1    A      QR
1    B      QB
4    A      QR
4    B      QB

In Hive, what's the query to get the result should only outcome the CID that fall in both F_ID - A & B, I can acheive the same using LISTAGG in oracle


Solution

  • This query will execute in single map-reduce stage:

    select CID,    F_ID,       NME  from
    (
    select s.*, 
           sum(A) over (partition by CID) A_cnt,
           sum(B) over (partition by CID) B_cnt
    from
    (
    select s.*,
           case when F_ID='A' then 1 else 0 end A,
           case when F_ID='B' then 1 else 0 end B
     from your_table
    )s
    )s where A_cnt>=1 and B_cnt >=1
    ;
    

    Demo:

    select CID,    F_ID,       NME  from
    (
    select s.*, 
           sum(A) over (partition by CID) A_cnt,
           sum(B) over (partition by CID) B_cnt
    from
    (
    select s.*,
           case when F_ID='A' then 1 else 0 end A,
           case when F_ID='B' then 1 else 0 end B
     from
    ( --replace this subquery (s) with your table  
      select stack(6,
      1,       'A',          'QR',
      1,       'B',          'QB',
      2,       'A',          'QR',
      3,       'B',          'QB',
      4,       'A',          'QR',
      4,       'B',          'QB') as (CID,    F_ID,       NME)
    ) s
    )s
    )s where A_cnt>=1 and B_cnt >=1
    ;
    

    Result:

    MapReduce Jobs Launched:
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.39 sec   HDFS Read: 13549 HDFS Write: 28 SUCCESS
    Total MapReduce CPU Time Spent: 6 seconds 390 msec
    OK
    1       B       QB
    1       A       QR
    4       B       QB
    4       A       QR
    Time taken: 108.779 seconds, Fetched: 4 row(s)