Search code examples
rrolling-sum

Conditional rolling sum of events with 'ragged' dates


Introduction

I am using R to analyze the 'momentum' of protest movements in Africa. To do so, I am analyzing individual protest events. I want to create a rolling measure of the rolling number (sum) of protests within a time period.

Most of the answers here on Stack Overflow deal with datasets where observations are at fixed intervals (one obs. per day or per month, etc.). But my data are 'ragged' in the sense that they occur in different intervals. Sometimes there is one day between observations. Other times there are two weeks.

What I want to create

A rolling sum of the number of protest events that have occurred in a given country over the past 10 days. This would be in the form of a variable that simply sums the number of events within the past ten days, inclusive of the current event.

The Data

Here is a reproducible set of data:

df1 <- data.frame(date = c("8/1/2019", "8/2/2019", "8/3/2019", "8/6/2019", "8/15/2019", "8/16/2019", "8/30/2019", "9/1/2019", "9/2/2019", "9/3/2019", "9/4/2019", "6/1/2019", "6/26/2019", "7/1/2019", "7/2/2019", "7/9/2019", "7/10/2019", "8/1/2019", "8/2/2019", "8/15/2019", "8/28/2019", "9/1/2019"),
country = c(rep("Algeria", 11), rep("Benin", 11)),
event = rep("Protest", 22))

What I want the data to look like

date       country   event     roll_sum
--------   -------   -------   --------
8/1/2019   Algeria   Protest   1
8/2/2019   Algeria   Protest   2
8/3/2019   Algeria   Protest   3
8/6/2019   Algeria   Protest   4
8/15/2019  Algeria   Protest   2
8/16/2019  Algeria   Protest   3
8/30/2019  Algeria   Protest   1
9/1/2019   Algeria   Protest   2
9/2/2019   Algeria   Protest   3
9/3/2019   Algeria   Protest   4 
9/4/2019   Algeria   Protest   5
6/1/2019   Benin     Protest   1
6/26/2019  Benin     Protest   1
7/1/2019   Benin     Protest   2
7/2/2019   Benin     Protest   3
7/9/2019   Benin     Protest   3
7/10/2019  Benin     Protest   4
8/1/2019   Benin     Protest   1
8/2/2019   Benin     Protest   2
8/15/2019  Benin     Protest   1
8/28/2019  Benin     Protest   1
9/1/2019   Benin     Protest   2

This is all probably very simple, but I can't figure out how to do it. Thank you in advance!


Solution

  • A base R approach,

    df1$date <- as.Date(df1$date,"%m/%d/%Y")
    
    vector <- vector()
    
    for( j in unique(df1$country)) {
        df2 <- df1[df1$country==j,]
        for(i in 1:nrow(df2)) {
    
         k <- nrow(df2[df2$date<= df2$date[i] & df2$date>=df2$date[i]-10 ,])
    
         vector <- c(vector, k)
    
        }
    }
    
    df1$roll_sum <- vector
    

    gives,

             date country   event roll_sum
    1  2019-08-01 Algeria Protest        1
    2  2019-08-02 Algeria Protest        2
    3  2019-08-03 Algeria Protest        3
    4  2019-08-06 Algeria Protest        4
    5  2019-08-15 Algeria Protest        2
    6  2019-08-16 Algeria Protest        3
    7  2019-08-30 Algeria Protest        1
    8  2019-09-01 Algeria Protest        2
    9  2019-09-02 Algeria Protest        3
    10 2019-09-03 Algeria Protest        4
    11 2019-09-04 Algeria Protest        5
    12 2019-06-01   Benin Protest        1
    13 2019-06-26   Benin Protest        1
    14 2019-07-01   Benin Protest        2
    15 2019-07-02   Benin Protest        3
    16 2019-07-09   Benin Protest        3
    17 2019-07-10   Benin Protest        4
    18 2019-08-01   Benin Protest        1
    19 2019-08-02   Benin Protest        2
    20 2019-08-15   Benin Protest        1
    21 2019-08-28   Benin Protest        1
    22 2019-09-01   Benin Protest        2