I have a simple task to do a join between 'customer' (parent table) and 'order' table (child table) where the joined row of a child table has latest (max) order date value. If Impala were like any SQL engine you could write:
select * from customer c
join `order` o on o.customer_id=c.id
and o.id=(
select o2.id
from `order` o2
where o2.customer_id=c.id
order by o2.order_date
desc limit 1
);
Obviously impala is different because I simply get the following error:
Error while compiling statement: FAILED: ParseException line 4:1 cannot recognize input near 'select' 'o2' '.' in expression specification
I tried to replace the 'and' with 'where' between the subquery but it didn't help.
You should be able to do this with a join
and aggregation
in the from
clause:
select c.*, o.*
from customer c join
`order` o
on o.customer_id = c.id join
(select customer_id, max(o2.order_date) as maxod
from `order` o2
group by customer_id
) oo
on oo.customer_id = o.customer_id and oo.maxod = o.order_date;
This assumes that the maximum order date has only one order. If this is not reasonable, then perhaps you can just use max(id)
instead of max(order_date)
. If the ids are assigned sequentially, then this will do what you want.
You might be able to do what you want using exists
:
select c.*, o.*
from customer c join
`order` o
on o.customer_id = c.id
where not exists (select 1
from `order` o2
where o2.customer_id = o.customer_id and
(o2.order_date > o.order_date or
(o2.order_date = o.order_date and o2.id > o.id)
)
);