Search code examples
rdplyrdata.tabletidyversetidyr

Get a running count of group over time in r


Ok so we have some pretty standard data that looks like this with a date and user id column, but the id can occur multiple times in a day:

id               Date
as7fyaisdf       2017-11-08
p98ashdfp9       2017-11-08
p98ashdfp9       2017-11-08
p98ashdfp9       2017-11-08
2984oinrv7       2017-11-08
as7fyaisdf       2017-11-09
p98ashdfp9       2017-11-09
2984oinrv7       2017-11-09
9asjenorin       2017-11-09

I want to get a running count that shows the cumulative number of times a given id has occurred over time. But I only want it to count a single day once. So it would look like this in this case:

id               Date           running_count
as7fyaisdf       2017-11-08     1
p98ashdfp9       2017-11-08     1
p98ashdfp9       2017-11-08     1
p98ashdfp9       2017-11-08     1
2984oinrv7       2017-11-08     1
as7fyaisdf       2017-11-09     2
p98ashdfp9       2017-11-09     2
2984oinrv7       2017-11-09     2
9asjenorin       2017-11-09     1

I feel like this probably involves the rle() function or data.table's rleid(), but I haven't been able to crack it. I'd like to do this within the tidyverse if possible, but open to other options in the R universe. Would like to keep this in any help is appreciated.


Solution

  • You could group by id and get the row_number:

    library(tidyverse) 
    df %>%
       left_join(distinct(.) %>%
       group_by(id) %>%
       mutate(running_count = row_number()))
    
              id       Date running_count
    1 as7fyaisdf 2017-11-08             1
    2 p98ashdfp9 2017-11-08             1
    3 p98ashdfp9 2017-11-08             1
    4 p98ashdfp9 2017-11-08             1
    5 2984oinrv7 2017-11-08             1
    6 as7fyaisdf 2017-11-09             2
    7 p98ashdfp9 2017-11-09             2
    8 2984oinrv7 2017-11-09             2
    9 9asjenorin 2017-11-09             1