Search code examples
sqljoinhivehiveql

Using three or more joins within a single hive query


Good afternoon, please tell me whether the output will be the same for these two situations? What are the alternatives if you need to use three or more different joins inside a single hive query.

from a
  join b on a.key=b.key
  join c on a.key=c.key
  left join u on a.key=u.key


_______
from a
  join b on a.key=b.key
  left join u on a.key=u.key
  join c on a.key=c.key

Solution

  • Yes, they are equivalent, although the results may not be in the same order in the result set. And if you used select *, then the columns would be in a different order.

    The reason is a little subtle -- the outer joined table is not used anywhere else in the FROM clause. So, you don't have to worry about NULL values from non-matching rows.

    As a general rule, I order joins in the FROM clause starting with inner joins and followed by outer joins. The clause becomes quite difficult to accurately follow when you start mixing join types. So, I recommend:

    from a join
         b
         on a.key = b.key join
         c
         on a.key = c.key left join
         u
         on a.key = u.key