I'm working with dates in R, and I want to convert the dates into a number that represents how many attempts it took for a participant to pass a test. Some participants took multiple attempts, and others took just one. Furthermore, some took the test years before others, so I don't care about the date, just if it was time one or time two, etc.
Here's a mock dataset:
library(dplyr)
library(lubridate)
problem <- tibble(name = c("Britney", "Christina", "Justin", "Britney", "Britney", "Christina", "Christina", "Christina"),
score = c(1, 2, 3, 3, 3, 2, 4, 2),
date = ymd_hms(c("2019-02-26 00:18:09", "2019-04-26 00:18:09", "2019-02-20 00:18:09", "2018-02-26 00:18:09", "2017-02-26 00:18:09", "2016-02-26 00:18:09", "2015-02-26 00:18:09", "2010-02-26 00:18:09")))
And here's what I want it to look like in the end:
solution <- tibble(name = c("Britney", "Christina", "Justin", "Britney", "Britney", "Christina", "Christina", "Christina"),
score = c(1, 2, 3, 3, 3, 2, 4, 2),
date = ymd_hms(c("2019-02-26 00:18:09", "2019-04-26 00:18:09", "2019-02-20 00:18:09", "2018-02-26 00:18:09", "2017-02-26 00:18:09", "2016-02-26 00:18:09", "2015-02-26 00:18:09", "2010-02-26 00:18:09")),
order = c(3, 4, 1, 2, 1, 3, 2, 1))
solution
Thank you!
Or group_by
name
and assign row_number
after arranging the data by name
and date
library(dplyr)
problem %>%
arrange(name, date) %>%
group_by(name) %>%
mutate(order = row_number())
# A tibble: 8 x 4
# Groups: name [3]
# name score date order
# <chr> <dbl> <dttm> <int>
#1 Britney 3 2017-02-26 00:18:09 1
#2 Britney 3 2018-02-26 00:18:09 2
#3 Britney 1 2019-02-26 00:18:09 3
#4 Christina 2 2010-02-26 00:18:09 1
#5 Christina 4 2015-02-26 00:18:09 2
#6 Christina 2 2016-02-26 00:18:09 3
#7 Christina 2 2019-04-26 00:18:09 4
#8 Justin 3 2019-02-20 00:18:09 1