Search code examples
rdatesequencesummarizeautonumber

R: Using doBy with Dates


I am doing some coding in R. I am trying to use the doBy package to get a sum total score for a variable (x) by both date (date) and by id (id). The doBy command works fine and I get this output.

data
id date       x
1  01/01/2021 1
1  01/02/2021 2
1  01/03/2021 3
2  02/01/2021 2
2  02/02/2021 3
2  02/02/2021 4
3  03/11/2021 3
3  03/12/2021 3
3  03/13/2021 2

I want to recode the date so that everyone's first chronological date 1 is 1, the chronological second date is 2, the chronological 3rd date is 3, etc. I want my date to look something like this.

data2
id daycount x
1  1        1
1  2        2
1  3        3
2  1        2
2  2        3
2  3        4
3  1        3
3  2        3
3  3        2

I was able to order the days using order() but I am not sure how to get the dates to match up. I think I need some kind of sequence or autonumber. Also, some participants may have different number of days. Some participants may have 1 day and other participants may have 10 days.


Solution

  • 1) doBy Assuming that the dates are already sorted within id:

    library(doBy)
    transform_by(data, ~ id, countdays = seq_along(id))
    

    giving:

      id       date x countdays
    1  1 01/01/2021 1         1
    2  1 01/02/2021 2         2
    3  1 01/03/2021 3         3
    4  2 02/01/2021 2         1
    5  2 02/02/2021 3         2
    6  2 02/02/2021 4         3
    7  3 03/11/2021 3         1
    8  3 03/12/2021 3         2
    9  3 03/13/2021 2         3
    

    2) Base R It could also be done using transform and ave in base R.

    transform(data, daycount = ave(id, id, FUN = seq_along))
    

    giving:

      id       date x daycount
    1  1 01/01/2021 1        1
    2  1 01/02/2021 2        2
    3  1 01/03/2021 3        3
    4  2 02/01/2021 2        1
    5  2 02/02/2021 3        2
    6  2 02/02/2021 4        3
    7  3 03/11/2021 3        1
    8  3 03/12/2021 3        2
    9  3 03/13/2021 2        3
    

    Note

    data in reproducible form:

    Lines <- "id date       x
    1  01/01/2021 1
    1  01/02/2021 2
    1  01/03/2021 3
    2  02/01/2021 2
    2  02/02/2021 3
    2  02/02/2021 4
    3  03/11/2021 3
    3  03/12/2021 3
    3  03/13/2021 2"
    data <- read.table(text = Lines, header = TRUE)