Search code examples
rdata-manipulationdata-handling

How to count number of rows based on a certain value in a panel data set?


I am handling a panel data like df. So I have a unique UserID (Uid), a TimeVariable (TV) and my main variable of interest(MV, dummy coded). Each row represents a unique UiD TV combination. Now I want to create a new variable (NV), which counts basically the number of events (MV=1) that happend up to time TV.

The contect: I want to have a count variable (NV) that basically counts for each moment in time how often Event MV=1 happend up to that moment (TV).

So far I can only manage to sum it up per UserId but not having it on a increasing level.

What I managed so far:

df <- df %>% group_by(user_id) %>% mutate(NV=count_if(1,MV))

The result is given to my on a aggreate basis per user ID for the MV

So the df looks like:

UI TV MV
1  1  0
1  2  1
1  3  0
2  1  0
2  2  0
2  3  1
2  4  2
3  1  1
3  2  0
3  3  1
3  4  1

The resuls with my code so far is:

UI TV MV NV
1  1  0  1
1  2  1  1
1  3  0  1
2  1  0  2
2  2  0  2
2  3  1  2
2  4  2  2
3  1  1  3
3  2  0  3
3  3  1  3
3  4  1  3

What I actually want:

UI TV MV NV
1  1  0  0
1  2  1  1
1  3  0  1
2  1  0  0
2  2  0  0
2  3  1  1
2  4  2  2
3  1  1  1
3  2  0  1
3  3  1  2
3  4  1  3

Thanks very much in advance for you help!


Solution

  • We could group_by UI and take cumulative sum of occurrence of MV == 1

    library(dplyr)
    
    df %>%
      group_by(UI) %>%
      mutate(NV = cumsum(MV == 1))
    
    
    #      UI    TV    MV    NV
    #   <int> <int> <int> <int>
    # 1     1     1     0     0
    # 2     1     2     1     1
    # 3     1     3     0     1
    # 4     2     1     0     0
    # 5     2     2     0     0
    # 6     2     3     1     1
    # 7     2     4     2     1
    # 8     3     1     1     1
    # 9     3     2     0     1
    #10     3     3     1     2
    #11     3     4     1     3
    

    In base R, we can use ave

    with(df, ave(MV == 1, UI, FUN = cumsum))
    #[1] 0 1 1 0 0 1 1 1 1 2 3
    

    data

    df <- structure(list(UI = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
    3L), TV = c(1L, 2L, 3L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), MV = c(0L, 
    1L, 0L, 0L, 0L, 1L, 2L, 1L, 0L, 1L, 1L)), class = "data.frame", 
    row.names = c(NA, 
    -11L))