Search code examples
hadoophivehiveqlapache-hive

Hive Joins query


I have two tables in hive:

Table 1:

1,Nail,maher,24,6.2
2,finn,egan,23,5.9
3,Hadm,Sha,28,6.0
4,bob,hope,55,7.2

Table 2 :

1,Nail,maher,24,6.2
2,finn,egan,23,5.9
3,Hadm,Sha,28,6.0
4,bob,hope,55,7.2
5,john,hill,22,5.5
6,todger,hommy,11,2.2
7,jim,cnt,99,9.9
8,will,hats,43,11.2

Is there any way in Hive to retrieve the new data in table 2 that doesn't exist in table 1??

In other Databases tools, you would use a inner left/right. But inner left/right doesn't exist in Hive and suggestions how this could be achieved?


Solution

  • If you are using Hive version >= 0.13 you can use this query:

    SELECT * FROM A WHERE A.firstname, A.lastname ... IN (SELECT B.firstname, B.lastname ... FROM B);
    

    But I'm not sure if Hive supports multiple coloumns in the IN clause. If not something like this could work:

    SELECT * FROM A WHERE A.firstname IN (SELECT B.firstname FROM B) AND A.lastname IN (SELECT b.lastname FROM B) ...;