In my game dataset, I have observations for several game players for several points in time. For each observation, I want to compute a rank for this player based on the number of points compared to the number of points of other players at this point in time. Therefore, it has to compare the points of this player of this observation with the number of points of all other players at their last (or time-wise closest in the past or exactly at the same second) observation of each of the others users.
Example data including the expected rank:
da = data.frame(player = c(1,1,1,2,2,2,3,3,3), date_sec = c(1451665633,1451665693,1451665721,1451665627,1451665692,1451665738,1451665626,1451665684,1451665765), points = c(100,150,200,130,140,230,80,90,100), rank = c(2,1,1,1,1,1,1,3,3))
da
player date_sec points rank
1 1 1451665633 100 2
2 1 1451665693 150 1
3 1 1451665721 200 1
4 2 1451665627 130 1
5 2 1451665692 140 1
6 2 1451665738 230 1
7 3 1451665626 80 1
8 3 1451665684 90 3
9 3 1451665765 100 3
E.g., player 2 has in date_sec 1451665738 rank 1, because at this point in time, no other player has had more points.
I know how to rank within groups, but I don't find a way to take the exact point in time into account here. A way would be to extract the day out of the timestamp and to group by day and player, but this is not not as accurate as I would like to have, because the rank can change several times within one day.
library(dplyr)
da2 = mutate(da, day = as.Date(as.POSIXct(date_sec, origin="1970-01-01"))) %>%
group_by(player, day) %>%
mutate(my_ranks = order(order(points, day, decreasing=TRUE)))
da2
A tibble: 9 x 6
# Groups: player, day [3]
player date_sec points rank day my_ranks
<dbl> <dbl> <dbl> <dbl> <date> <int>
1 1 1451665633 100 2 2016-01-01 3
2 1 1451665693 150 1 2016-01-01 2
3 1 1451665721 200 1 2016-01-01 1
4 2 1451665627 130 1 2016-01-01 3
5 2 1451665692 140 2 2016-01-01 2
6 2 1451665738 230 2 2016-01-01 1
7 3 1451665626 80 3 2016-01-01 3
8 3 1451665684 90 3 2016-01-01 2
9 3 1451665765 100 3 2016-01-01 1
Next to the rank, I would also like to have a percentile rank based on the number of points, because there are different numbers of players active in different time periods.
Does anyone have an idea?
First, I would use complete
so that you have a row for each player
/date_sec
combination. This will allow for easier comparisons for each moment in time.
Next, I would use fill
to carry forward each player's most recent score. The scores should be sorted/arranged before this.
Then, since you have a score for each time stamp, you can group_by(date_sec)
and order the players at each time for ranking.
Finally, you can join back to original data frame to provide the desired ranks.
library(tidyverse)
da %>%
complete(player, date_sec) %>%
group_by(player) %>%
arrange(date_sec) %>%
fill(points) %>%
group_by(date_sec) %>%
mutate(my_ranks = order(order(points, decreasing = TRUE))) %>%
right_join(da)
Output
Joining, by = c("player", "date_sec", "points", "rank")
# A tibble: 9 x 5
# Groups: date_sec [9]
player date_sec points rank my_ranks
<dbl> <dbl> <dbl> <dbl> <int>
1 1 1451665633 100 2 2
2 1 1451665693 150 1 1
3 1 1451665721 200 2 1
4 2 1451665627 130 1 1
5 2 1451665692 140 2 1
6 2 1451665738 230 1 1
7 3 1451665626 80 3 1
8 3 1451665684 90 3 3
9 3 1451665765 100 3 3