Search code examples
sqloraclegreatest-n-per-group

What is the most efficient way to get row with the max value?


Imagine we have a beautiful hotel. This hotel has a database with just one table:

room check-in    check_out   other columns...
1    2020-02-04  2020-02-05  ...
1    2020-02-06  2020-02-09  ...
1    2020-04-20  NULL        ...
2    2020-03-29  2020-04-01  ...
2    2020-04-17  2020-04-18  ...

What's the best and efficient way to select the last check-in for every room with other columns' values (otherwise I would just use room, max(check-in)?

Expected result is

room check_in   check_out  other columns... 
1    2020-04-20 NULL       ...
2    2020-04-17 2020-04-18 ...

First idea that came to my mind was to join this table with its copy:

WITH last_checkins AS (
    SELECT room, max(check_in) AS last_c
    FROM rooms
    GROUP BY room
)
SELECT *
FROM rooms r
         INNER JOIN last_chekins c
                    ON r.room = c.room
                        AND r.checkin = c.last_c;

What I dislike about this idea

  • It seems a bit inefficient. There are 30 million rooms in this hotel. So I would have to join two big tables
  • I'm a bit afraid of joining on dates. It feels like something might go wrong. Check_in column has time too by the way. Which makes it messier.

I would like to know are my concerns relevant?


Solution

  • The most convenient is probably row_number():

    select r.*
    from (select r.*,
                 row_number() over (partition by room order by checkin dec) as seqnum
          from rooms r
         ) r
    where seqnum = 1;
    

    With an index on (room, checkin), this should also have good performance.

    Sometimes a correlated subquery works better:

    select r.*
    from rooms r
    where r.checkin = (select max(r2.checkin)
                       from rooms r2
                       where r2.room = r.room
                      );
    

    Oracle has a good optimizer so I am not sure which works better in your situation.