Search code examples
sqljoinhive

Join evaluation order in HIVE


I was trying to run a query that would make use of multiple joins inside HIVE.

example:

SELECT * 
FROM table1
LEFT JOIN table2          -- the table resulted from the inner join should be left joined to table1
INNER JOIN table3         -- this inner join should happen first between table2 and table3
ON table3.id = table2.id
ON table2.id = table1.id

I think this is perfectly valid on other SQL DBMS's, but HIVE gives me an error. Are this kind of joins ( I really don't know what to call them so I can't google them) illegal in HIVE?

Workarounds would be some subquery unions, but I am more interested in getting more information on this kind of syntax.

Thanks!


Solution

  • This is valid SQL syntax and should be parsed as:

    FROM table1 LEFT JOIN
         (table2 INNER JOIN
          table3 
          ON table3.id = table2.id
         )
         ON table2.id = table1.id
    

    By convention, ON clauses are interleaved with JOINs, sot the conditions are where the JOIN is specified. However, the syntax allows for this construct as well.

    I don't use such syntax -- and I strongly discourage using it without parentheses -- but I thought pretty much all databases supported it.

    If parentheses don't work, you have two options. One is a subquery:

    This is valid SQL syntax and should be parsed as:

    FROM table1 LEFT JOIN
         (SELECT table2.id, . . . -- other columns you want
          FROM table2 INNER JOIN
               table3 
               ON table3.id = table2.id
         ) t23
         ON t23.id = table1.id
    

    Or using a RIGHT JOIN:

    SELECT table2 INNER JOIN
           table3
           ON table3.id = table2.id RIGHT JOIN
           table1
           ON table2.id = table1.id
    

    In this case, the RIGHT JOIN should be equivalent. But it can be complicated getting exactly the same semantics when multiple joins are involved (and without using parentheses).