Search code examples
sqlimpala

Join row with MAX row in another table in Impala?


I have a simple task to do a join between 'customer' (parent table) and 'order' table (child table) where the joined row of a child table has latest (max) order date value. If Impala were like any SQL engine you could write:

select * from customer c
join `order` o on o.customer_id=c.id
and o.id=(
    select o2.id 
    from `order` o2 
    where o2.customer_id=c.id 
    order by o2.order_date 
    desc limit 1
);

Obviously impala is different because I simply get the following error:

Error while compiling statement: FAILED: ParseException line 4:1 cannot recognize input near 'select' 'o2' '.' in expression specification

I tried to replace the 'and' with 'where' between the subquery but it didn't help.


Solution

  • You should be able to do this with a join and aggregation in the from clause:

    select c.*, o.*
    from customer c join
         `order` o
         on o.customer_id = c.id join
         (select customer_id, max(o2.order_date) as maxod
          from `order` o2
          group by customer_id
         ) oo
         on oo.customer_id = o.customer_id and oo.maxod = o.order_date;
    

    This assumes that the maximum order date has only one order. If this is not reasonable, then perhaps you can just use max(id) instead of max(order_date). If the ids are assigned sequentially, then this will do what you want.

    You might be able to do what you want using exists:

    select c.*, o.*
    from customer c join
         `order` o
         on o.customer_id = c.id
    where not exists (select 1
                      from `order` o2
                      where o2.customer_id = o.customer_id and
                            (o2.order_date > o.order_date or
                             (o2.order_date = o.order_date and o2.id > o.id)
                            )
                     );