Search code examples
oracle-databaseoracle11gtop-n

Rank () function over multiple columns in oracle


I need to rank a table with two columns transID & travel_date

here is my data

transID    travel_date
2341       2018-04-04 10:00:00
2341       2018-04-04 11:30:00
2891       2018-04-04 12:30:00
2891       2018-04-04 18:30:00
2341       2018-04-05 11:30:00
2891       2018-04-05 22:30:00

this is the query which i have tried

 select transID,travel_date,rn,
    dense_rank () over (partition by transID order by EarliestDate,transID) as rn2
    from
    (SELECT transID,travel_date,
             ROW_NUMBER() OVER (PARTITION BY transID ORDER BY travel_date) AS rn, 
             max(travel_date) OVER (partition by travel_date) as EarliestDate
      FROM travel_log_info
     ) t
     order by transID;

Current Output from the above query

transID    travel_date            rn2
2341       2018-04-04 10:00:00    1
2341       2018-04-04 11:30:00    2
2341       2018-04-05 11:30:00    3
2891       2018-04-04 12:30:00    1
2891       2018-04-04 18:30:00    2
2891       2018-04-05 22:30:00    3

Expected Output

transID    travel_date            rn2
2341       2018-04-04 10:00:00    1
2341       2018-04-04 11:30:00    2
2341       2018-04-05 11:30:00    1
2891       2018-04-04 12:30:00    1
2891       2018-04-04 18:30:00    2
2891       2018-04-05 22:30:00    1

with this output, I can get the desired output by where condition rn2 = 1 to get the output based on travel date and transId.

I am not getting the desired output as shown above. Kindly provide suggestions to achieve the correct output. Thanks for your time


Solution

  • The main problem with what you have now is:

    max(travel_date) OVER (partition by travel_date)
    

    which includes the time part of each date in the partition - so you're really getting the max of every individual date/time, which is that date/time. You seem to want maximum date/time within each day, so you could partition by each day by using trunc() in the partition-by clause:

    max(travel_date) OVER (partition by trunc(travel_date))
    

    Just that change gives you:

       TRANSID TRAVEL_DATE                 RN        RN2
    ---------- ------------------- ---------- ----------
          2341 2018-04-04 10:00:00          1          1
          2341 2018-04-04 11:30:00          2          1
          2341 2018-04-05 11:30:00          3          2
          2891 2018-04-04 12:30:00          1          1
          2891 2018-04-04 18:30:00          2          1
          2891 2018-04-05 22:30:00          3          2
    

    The partitioning in the outer query is also wrong though, you need to partition by that 'earliest' date (actually latest, but doesn't matter for this):

    select transID,travel_date,rn,
        dense_rank () over (partition by transID,EarliestDate order by travel_date) as rn2
        from
        (SELECT transID,travel_date,
                 ROW_NUMBER() OVER (PARTITION BY transID ORDER BY travel_date) AS rn, 
                 max(travel_date) OVER (partition by trunc(travel_date)) as EarliestDate
          FROM travel_log_info
         ) t
         order by transID;
    
       TRANSID TRAVEL_DATE                 RN        RN2
    ---------- ------------------- ---------- ----------
          2341 2018-04-04 10:00:00          1          1
          2341 2018-04-04 11:30:00          2          2
          2341 2018-04-05 11:30:00          3          1
          2891 2018-04-04 12:30:00          1          1
          2891 2018-04-04 18:30:00          2          2
          2891 2018-04-05 22:30:00          3          1
    

    But you don't really need that max, or the outer query you currently have; if you include that truncated day in the row_number() partition (which you currently aren't really using) you get:

    SELECT transID,travel_date,
      ROW_NUMBER() OVER (PARTITION BY transID, trunc(travel_date) ORDER BY travel_date) AS rn
    FROM travel_log_info;
    
       TRANSID TRAVEL_DATE                 RN
    ---------- ------------------- ----------
          2341 2018-04-04 10:00:00          1
          2341 2018-04-04 11:30:00          2
          2341 2018-04-05 11:30:00          1
          2891 2018-04-04 12:30:00          1
          2891 2018-04-04 18:30:00          2
          2891 2018-04-05 22:30:00          1
    

    and you can then wrap that in an outer query to filter on rn:

    SELECT transID,travel_date
    FROM (
      SELECT transID,travel_date,
        ROW_NUMBER() OVER (PARTITION BY transID, trunc(travel_date) ORDER BY travel_date) AS rn
      FROM travel_log_info
    )
    WHERE rn = 1
    ORDER BY transID,travel_date;
    
       TRANSID TRAVEL_DATE        
    ---------- -------------------
          2341 2018-04-04 10:00:00
          2341 2018-04-05 11:30:00
          2891 2018-04-04 12:30:00
          2891 2018-04-05 22:30:00
    

    You could also do this without a subquery; this gets the same result using first:

    SELECT transID,
      min(travel_date) keep (dense_rank first order by travel_date) as travel_date
    FROM travel_log_info
    GROUP BY transID, trunc(travel_date)
    ORDER BY transID, travel_date;