Hive Multi Select

CID    F_ID       NME
1       A          QR
1       B          QB
2       A          QR
3       B          QB
4       A          QR
4       B          QB

Result: -

CID F_ID    NME
1    A      QR
1    B      QB
4    A      QR
4    B      QB

In Hive, what's the query to get the result should only outcome the CID that fall in both F_ID - A & B, I can acheive the same using LISTAGG in oracle

Solution

This query will execute in single map-reduce stage:

select CID,    F_ID,       NME  from
(
select s.*, 
       sum(A) over (partition by CID) A_cnt,
       sum(B) over (partition by CID) B_cnt
from
(
select s.*,
       case when F_ID='A' then 1 else 0 end A,
       case when F_ID='B' then 1 else 0 end B
 from your_table
)s
)s where A_cnt>=1 and B_cnt >=1
;

Demo:

select CID,    F_ID,       NME  from
(
select s.*, 
       sum(A) over (partition by CID) A_cnt,
       sum(B) over (partition by CID) B_cnt
from
(
select s.*,
       case when F_ID='A' then 1 else 0 end A,
       case when F_ID='B' then 1 else 0 end B
 from
( --replace this subquery (s) with your table  
  select stack(6,
  1,       'A',          'QR',
  1,       'B',          'QB',
  2,       'A',          'QR',
  3,       'B',          'QB',
  4,       'A',          'QR',
  4,       'B',          'QB') as (CID,    F_ID,       NME)
) s
)s
)s where A_cnt>=1 and B_cnt >=1
;

Result:

MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.39 sec   HDFS Read: 13549 HDFS Write: 28 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 390 msec
OK
1       B       QB
1       A       QR
4       B       QB
4       A       QR
Time taken: 108.779 seconds, Fetched: 4 row(s)