Search code examples
hivehiveqlhadoop2

Left Join in Hive is not returning expected results


I have 3 tables, one is datetable which contains only date, other 2 has data as follows.

datetable:

enter image description here

Table1:

enter image description here

Table2:

enter image description here

I am doing left join with date table as follows:

select * from 
(select distinct t.d,
coalesce(tab1.name,tab2.name,"") as name,
coalesce(tab1.id,tab2.id,"") as id,
coalesce(tab1.tgt_cnt,0) as tgt_cnt,
coalesce(tab2.a_cnt,0) as a_cnt,
coalesce(tab2.b_cnt,0) as b_cnt,
coalesce(tab2.c_cnt,0) as v_cnt
from datetable t
LEFT JOIN (select * from table1) tab1 on t.d = tab1.dt
LEFT JOIN (select * from table2) tab2 on t.d = tab2.dt) a
where (tgt_cnt <> 0 or a_cnt <> 0 or b_cnt <> 0 or c_cnt <> 0);

And I am getting following results.

enter image description here

My question is what happened to the record TOM. I am not sure why CG and Bob repeating twice. Is something wrong in my query.

Could you please suggest why TOM record not coming in the Left Join and why CG and BOB are repeating.

I am expecting following results. enter image description here

Really appreciate your help.

Thanks,Babu


Solution

  • coalesce(tab1.name,tab2.name,"") as name will replace the names in table2 with those in table1, so TOM never appeared because it got replaced by CG or BOB.

    I'm guessing what you're trying to achieve here... seems like you want to combine table1 and table2. I think a full join is appropriate.

    select * from (
    select distinct
        coalesce(t1.dt, t2.dt) as dt,
        coalesce(t1.desc, t2.desc) as desc, 
        coalesce(t1.name, t2.name) as name, 
        coalesce(t1.id, t2.id) as id,
        coalesce(t1.tgt_cnt, 0) as tgt_cnt,
        coalesce(t2.a_cnt, 0) as a_cnt,
        coalesce(t2.b_cnt, 0) as b_cnt,
        coalesce(t2.c_cnt, 0) as c_cnt
    from table1 t1
    full join table2 t2
    on t1.name = t2.name and t1.dt = t2.dt
    ) a
    where (tgt_cnt <> 0 or a_cnt <> 0 or b_cnt <> 0 or c_cnt <> 0);
    

    which will give

    dt              desc    name    id      tgt_cnt a_cnt   b_cnt   c_cnt
    6/29/2020       NULL    Tom     3       0       0       0       1
    6/29/2020       AA      CG      1       3       1       1       0
    6/29/2020       AA      Bob     2       3       0       0       0