Search code examples
rpaneldata-manipulation

Using R to count a run since last event in panel data


I'm hoping you can help me with creating a variable that will count a "run" since a last event of another variable, using the R programming language. The data set with which I'm working is country-year panel data, and is unbalanced.

I'll illustrate what I'd like to do below. COUNTRY and YEAR are the cross-section identification and time unit respectively. COUNTRYYEAR is a concatenation of both variables, there to create an index for each unique observation.

Let EVENT be a binary indicator, marking whether an event of interest is present (EVENT = 1) or not (EVENT = 0). Let COUNTZERO be a discrete count variable, marking the time (here: years) since the last observed 1 on the EVENT variable. Let COUNTONE be another discrete count variable, marking a running count of consecutive ones of the EVENT variable. I'd like to have a data frame that looks like this:

COUNTRYYEAR COUNTRY YEAR EVENT COUNTZERO COUNTONE
10011950       1    1950  1       0         1
10011951       1    1951  1       0         2
10011952       1    1952  0       1         0 
10011953       1    1953  0       2         0 
10011954       1    1954  0       3         0 
10011955       1    1955  0       4         0 
10011956       1    1956  0       5         0

....

10021950       2    1950  1       0         1
10021951       2    1951  0       1         0
10021952       2    1952  1       0         1
10021953       2    1953  0       1         0
10021954       2    1954  0       2         0
10021955       2    1955  0       3         0
10021956       2    1956  0       4         0

....

10031975       3    1975  1       0         1
10031976       3    1976  1       0         2
10031977       3    1977  1       0         3
10031978       3    1978  1       0         4
10031979       3    1979  0       1         0
10031980       3    1980  0       2         0

....

The data go on. The panel data is unbalanced. Some countries are observed at the beginning (in my illustration: 1950) and others don't. Some countries drop out before the right hand end of the temporal domain and others don't. Some countries have all zeroes on the event and some have all 1s.

How can I go about creating those running count variables from the current EVENT variable I have? I looked at this solution, but, after running the example, it didn't quite create the vector I want to create.

Any input would be greatly appreciated.

Reproducible code of this illustration follows.

country <- c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3) 
year <- c(1950, 1951, 1952, 1953, 1954, 1955, 1956, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1975, 1976, 1977, 1978, 1979) 
event <- c(1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0) 
Data=data.frame(country=country, year=year, event=event)

Solution

  • Here's a data.table solution with sequence and rle:

    require(data.table)
    DT <- data.table(Data)
    DT[, c("count_zero", "count_one") := {
    rr <- sequence(rle(!event)$lengths)
    list(rr * !event, rr * event)}]
    #     country year event count_zero count_one
    #  1:       1 1950     1          0         1
    #  2:       1 1951     1          0         2
    #  3:       1 1952     0          1         0
    #  4:       1 1953     0          2         0
    #  5:       1 1954     0          3         0
    #  6:       1 1955     0          4         0
    #  7:       1 1956     0          5         0
    #  8:       2 1950     1          0         1
    #  9:       2 1951     0          1         0
    # 10:       2 1952     1          0         1
    # 11:       2 1953     0          1         0
    # 12:       2 1954     0          2         0
    # 13:       2 1955     0          3         0
    # 14:       2 1956     0          4         0
    # 15:       2 1957     0          5         0
    # 16:       2 1958     0          6         0
    # 17:       3 1975     1          0         1
    # 18:       3 1976     1          0         2
    # 19:       3 1977     1          0         3
    # 20:       3 1978     1          0         4
    # 21:       3 1979     0          1         0
    #     country year event count_zero count_one