I need to rank a table with two columns transID & travel_date
here is my data
transID travel_date
2341 2018-04-04 10:00:00
2341 2018-04-04 11:30:00
2891 2018-04-04 12:30:00
2891 2018-04-04 18:30:00
2341 2018-04-05 11:30:00
2891 2018-04-05 22:30:00
this is the query which i have tried
select transID,travel_date,rn,
dense_rank () over (partition by transID order by EarliestDate,transID) as rn2
from
(SELECT transID,travel_date,
ROW_NUMBER() OVER (PARTITION BY transID ORDER BY travel_date) AS rn,
max(travel_date) OVER (partition by travel_date) as EarliestDate
FROM travel_log_info
) t
order by transID;
Current Output from the above query
transID travel_date rn2
2341 2018-04-04 10:00:00 1
2341 2018-04-04 11:30:00 2
2341 2018-04-05 11:30:00 3
2891 2018-04-04 12:30:00 1
2891 2018-04-04 18:30:00 2
2891 2018-04-05 22:30:00 3
Expected Output
transID travel_date rn2
2341 2018-04-04 10:00:00 1
2341 2018-04-04 11:30:00 2
2341 2018-04-05 11:30:00 1
2891 2018-04-04 12:30:00 1
2891 2018-04-04 18:30:00 2
2891 2018-04-05 22:30:00 1
with this output, I can get the desired output by where condition rn2 = 1 to get the output based on travel date and transId.
I am not getting the desired output as shown above. Kindly provide suggestions to achieve the correct output. Thanks for your time
The main problem with what you have now is:
max(travel_date) OVER (partition by travel_date)
which includes the time part of each date in the partition - so you're really getting the max of every individual date/time, which is that date/time. You seem to want maximum date/time within each day, so you could partition by each day by using trunc()
in the partition-by clause:
max(travel_date) OVER (partition by trunc(travel_date))
Just that change gives you:
TRANSID TRAVEL_DATE RN RN2
---------- ------------------- ---------- ----------
2341 2018-04-04 10:00:00 1 1
2341 2018-04-04 11:30:00 2 1
2341 2018-04-05 11:30:00 3 2
2891 2018-04-04 12:30:00 1 1
2891 2018-04-04 18:30:00 2 1
2891 2018-04-05 22:30:00 3 2
The partitioning in the outer query is also wrong though, you need to partition by that 'earliest' date (actually latest, but doesn't matter for this):
select transID,travel_date,rn,
dense_rank () over (partition by transID,EarliestDate order by travel_date) as rn2
from
(SELECT transID,travel_date,
ROW_NUMBER() OVER (PARTITION BY transID ORDER BY travel_date) AS rn,
max(travel_date) OVER (partition by trunc(travel_date)) as EarliestDate
FROM travel_log_info
) t
order by transID;
TRANSID TRAVEL_DATE RN RN2
---------- ------------------- ---------- ----------
2341 2018-04-04 10:00:00 1 1
2341 2018-04-04 11:30:00 2 2
2341 2018-04-05 11:30:00 3 1
2891 2018-04-04 12:30:00 1 1
2891 2018-04-04 18:30:00 2 2
2891 2018-04-05 22:30:00 3 1
But you don't really need that max, or the outer query you currently have; if you include that truncated day in the row_number()
partition (which you currently aren't really using) you get:
SELECT transID,travel_date,
ROW_NUMBER() OVER (PARTITION BY transID, trunc(travel_date) ORDER BY travel_date) AS rn
FROM travel_log_info;
TRANSID TRAVEL_DATE RN
---------- ------------------- ----------
2341 2018-04-04 10:00:00 1
2341 2018-04-04 11:30:00 2
2341 2018-04-05 11:30:00 1
2891 2018-04-04 12:30:00 1
2891 2018-04-04 18:30:00 2
2891 2018-04-05 22:30:00 1
and you can then wrap that in an outer query to filter on rn
:
SELECT transID,travel_date
FROM (
SELECT transID,travel_date,
ROW_NUMBER() OVER (PARTITION BY transID, trunc(travel_date) ORDER BY travel_date) AS rn
FROM travel_log_info
)
WHERE rn = 1
ORDER BY transID,travel_date;
TRANSID TRAVEL_DATE
---------- -------------------
2341 2018-04-04 10:00:00
2341 2018-04-05 11:30:00
2891 2018-04-04 12:30:00
2891 2018-04-05 22:30:00
You could also do this without a subquery; this gets the same result using first
:
SELECT transID,
min(travel_date) keep (dense_rank first order by travel_date) as travel_date
FROM travel_log_info
GROUP BY transID, trunc(travel_date)
ORDER BY transID, travel_date;