Search code examples
rdataframefor-loopconstraintscumsum

How to make an accumulated sum with constraints


I need to sum values correspoding to a certain range of dates, and I need to do this for many subjects.

In the next example, df1 contains starting and ending dates for three subjects, and df2 contains dates and their corresponding values:

df1<-data.frame(sub=c("a","b","c"), 
                start=as.Date(c("2015/10/13","2015/10/13","2015/10/11")), 
                end=as.Date(c("2015/10/16","2015/10/18","2015/10/15")))

df2<-data.frame(date=seq(as.Date("2015/10/11"), as.Date("2015/10/18"), "days"), 
                value=c(32,30,28,15,19,23,35,18))

> df1
  sub      start        end
1   a 2015-10-13 2015-10-16
2   b 2015-10-13 2015-10-18
3   c 2015-10-11 2015-10-15

> df2
        date value
1 2015-10-11    32
2 2015-10-12    30
3 2015-10-13    28
4 2015-10-14    15
5 2015-10-15    19
6 2015-10-16    23
7 2015-10-17    35
8 2015-10-18    18

I want to sum value in df2 from start date until enddate in df1, and I want to do it for each sub (in the real problem there are many subjects (i.e. pairs of starting and ending dates), and so I thoght maybe I should use for loop).

I expect something like this:

sub sum
a   85
b   138
c   124

where a = 28+15+19+23, b = 28+15+19+23+35+18 and c = 32+30+28+15+19

Any help is appreciated


Solution

  • An option with mapply would be to create a sequence of dates between start and end dates in df1 and subset those dates from df2 and sum the value.

    df1$sum <- mapply(function(x, y) sum(df2$value[df2$date %in% x:y]), 
               df1$start, df1$end)
    
    
    df1[c(1, 4)]
    #  sub sum
    #1   a  85
    #2   b 138
    #3   c 124