Search code examples
group-byhivein-subquery

Hive SubQuery and Group BY


I have two tables

table1:

id
1
2
3

table 2:

id date
1   x1
4   x2
1   x3
3   x4
3   x5
1   x6
3   x5
6   x6
6   x5
3   x6

I want the count of each ids for table 2 that is present in table 1.

Result

id  count
1   3
2   0
3   4

I am using this query, but its giving me error:

SELECT tab2.id, count(tab2.id)
 FROM <mytable2> tab2
 GROUP BY tab2.id
 WHERE tab2.id IN (select id from <mytable1>)
 ;

Error is:

missing EOF at 'WHERE' near 'di_device_id'


Solution

  • There are two possible issues. Sub queries in the WHERE clause are only supported from Hive 0.13 and up. If you are using such a version, then your problem is just that you have WHERE and GROUP BY the wrong way round:

    SELECT tab2.id, count(tab2.id)
     FROM <mytable2> tab2
     WHERE tab2.id IN (select id from <mytable1>)
     GROUP BY tab2.id
     ;
    

    If you are using an older version of Hive then you need to use a JOIN:

    SELECT tab2.id, count(tab2.id)
     FROM <mytable2> tab2 INNER JOIN <mytable1> tab1 ON (tab2.id = tab1.id)
     GROUP BY tab2.id
     ;