Search code examples
rdplyrcounterpaneltimedelta

Create sequential counter starting with event and zeros before event for groups in panel


For a panel dataset (GSOEP), I need to create a time counter that gives me delta t after an event which is dummy coded 1 for that particular year for each individual. E.g. there are observations for an individual for a random range of years such as 1990-2006, with a seperate variable indicating 1 for a certain event in year e.g. 1996. The counter needs to start in the following year, should end with the next individual (id) and needs to be zero before the event for that individual occurs.

Currently the data looks like this:

df <- data.frame(id= rep(c("1","2","3"), each=6), year=rep(1998:2003, times=3), event=c(0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0), stringsAsFactors=FALSE)

   id year event
1   1 1998     0
2   1 1999     0
3   1 2000     1
4   1 2001     0
5   1 2002     0
6   1 2003     0
7   2 1998     0
8   2 1999     0
9   2 2000     0
10  2 2001     0
11  2 2002     1
12  2 2003     0
13  3 1998     0
14  3 1999     1
15  3 2000     0
16  3 2001     0
17  3 2002     0
18  3 2003     0

What is needed is this:

df <- data.frame(id= rep(c("1","2","3"), each=6), year=rep(1998:2003, times=3), event=c(0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0),delta=c(0,0,0,1,2,3,0,0,0,0,0,1,0,0,1,2,3,4), stringsAsFactors=FALSE)

   id year event delta
1   1 1998     0     0
2   1 1999     0     0
3   1 2000     1     0
4   1 2001     0     1
5   1 2002     0     2
6   1 2003     0     3
7   2 1998     0     0
8   2 1999     0     0
9   2 2000     0     0
10  2 2001     0     0
11  2 2002     1     0
12  2 2003     0     1
13  3 1998     0     0
14  3 1999     1     0
15  3 2000     0     1
16  3 2001     0     2
17  3 2002     0     3
18  3 2003     0     4

How can I achieve this? The closest I got was this here: Create sequential counter that restarts on a condition within panel data groups

But I do not know how to modify it in order that it only starts after the event has occured once and puts zeros before the event. Also there are some individuals for which there are no events, where the counter needs to give out zeros. The number of years (observations) for each individual is different, thus some id's range from 1984-1999 while other do so from 1995-2015.

You would help me out enormously and I want to thank you in advance for your time and effort.

Best Regards,

Julius


Solution

  • You can use group_by(id) and cumsum(cummax(event)) to get close - produces 1...N starting where event==1. I wrap it in ifelse(...) to subtract 1 from those values that are > 0.

    library(tidyverse)
    df %>%
      group_by(id) %>%
      mutate(delta = ifelse(cumsum(cummax(event)) > 0, cumsum(cummax(event)) - 1, 0)) %>%
      ungroup()
    
    # A tibble: 18 x 4
       # id     year event delta
       # <chr> <int> <dbl> <dbl>
     # 1 1      1998    0.    0.
     # 2 1      1999    0.    0.
     # 3 1      2000    1.    0.
     # 4 1      2001    0.    1.
     # 5 1      2002    0.    2.
     # 6 1      2003    0.    3.
     # 7 2      1998    0.    0.
     # 8 2      1999    0.    0.
     # 9 2      2000    0.    0.
    # 10 2      2001    0.    0.
    # 11 2      2002    1.    0.
    # 12 2      2003    0.    1.
    # 13 3      1998    0.    0.
    # 14 3      1999    1.    0.
    # 15 3      2000    0.    1.
    # 16 3      2001    0.    2.
    # 17 3      2002    0.    3.
    # 18 3      2003    0.    4.