I am trying in R to perform a ranking or grouping with the following characteristics:
Example data:
OBJECT | DATE |
---|---|
PRODUCT1 | 01/02/2023 |
PRODUCT1 | 02/02/2023 |
PRODUCT1 | 21/02/2023 |
PRODUCT2 | 07/02/2023 |
PRODUCT2 | 09/02/2023 |
PRODUCT2 | 10/02/2023 |
PRODUCT2 | 11/02/2023 |
PRODUCT2 | 23/02/2023 |
I am using the following code but the result is not correct:
library(plyr)
ddply(df, .(object), transform, rank = (seq_along(date)))
The expected result is similar to the following table:
OBJECT | DATE | RANK |
---|---|---|
PRODUCT1 | 01/02/2023 | 1 |
PRODUCT1 | 02/02/2023 | 1 |
PRODUCT1 | 21/02/2023 | 2 |
PRODUCT2 | 07/02/2023 | 1 |
PRODUCT2 | 09/02/2023 | 2 |
PRODUCT2 | 10/02/2023 | 2 |
PRODUCT2 | 11/02/2023 | 2 |
PRODUCT2 | 23/02/2023 | 3 |
I appreciate your help in solving this question.
The ranking groups by "object" and orders by "date". The wanted value is then based on if the gap between the previous row and the current row is greater than 1 day, then the rank is incremented by 1.
df <- data.frame(OBJECT = c("PRODUCT1", "PRODUCT1", "PRODUCT1", "PRODUCT2", "PRODUCT2", "PRODUCT2", "PRODUCT2", "PRODUCT2"),
DATE = as.Date(c("2023-02-01", "2023-02-02", "2023-02-21", "2023-02-07", "2023-02-09", "2023-02-10", "2023-02-11", "2023-02-23")))
library(dplyr)
# add the wanted column
df <- df %>%
group_by(OBJECT) %>%
arrange(DATE) %>%
mutate(wanted = cumsum(c(1, diff(DATE) > 1)))
df
| OBJECT | DATE | wanted |
+----------+------------+--------+
| PRODUCT1 | 2023-02-01 | 1 |
| PRODUCT1 | 2023-02-02 | 1 |
| PRODUCT1 | 2023-02-21 | 2 |
| PRODUCT2 | 2023-02-07 | 1 |
| PRODUCT2 | 2023-02-09 | 2 |
| PRODUCT2 | 2023-02-10 | 2 |
| PRODUCT2 | 2023-02-11 | 2 |
| PRODUCT2 | 2023-02-23 | 3 |