Search code examples
rlubridatecumsum

Difference in outputs using cumsum


Why are these two operations different?

library(lubridate)
library(magrittr)

> seconds_to_period(1:1000) %>% cumsum %>% sum
[1] 14492440
> 1:1000 %>% cumsum %>% sum
[1] 167167000

I have seen, however, that the issue lies on the fact that cumsum only adds the seconds of the period and ignores the rest:

seconds_to_period(60) +  seconds_to_period(60)
[1] "2M 0S"

but

> cumsum(c(seconds_to_period(60), seconds_to_period(60)))
[1] 0 0

Why is this behavior the default form? I think it is rather unintuitive. Additionally, what is the way to overcome this and get as a result the same as cumsum(1:1000) using 'Period' classes of lubridate that doesn't involve doing something like:

c(seconds_to_period(60), seconds_to_period(60)) %>% as.numeric %>% cumsum


Solution

  • Being cumsuma primitive, you can see here https://github.com/Microsoft/microsoft-r-open/blob/master/source/src/main/cum.c what R it is doing under the hood. Moreover, if you read from line 215:

    PROTECT(t = coerceVector(CAR(args), REALSXP));
        n = XLENGTH(t);
        PROTECT(s = allocVector(REALSXP, n));
        setAttrib(s, R_NamesSymbol, getAttrib(t, R_NamesSymbol));
        UNPROTECT(2); 
    

    This it is doing the coercion from period to numeric and because the structure of period, it is only keeping .Data

    Compare

    seconds_to_period(60)@.Data
    seconds_to_period(59)@.Data
    

    Therefore, at C level, R is not doing as.numeric but a faster, more efficient (but you may say less subtle because it is not realizing others elements from .Data as as.numericdoes) coercion of data.

    Look as this:

     setClass("Foo", representation(.Data="numeric", number1 = "numeric", number2 = "numeric"))
    
     bar <- new("Foo",.Data=5, number1 = 12, number2 = 31)
    
     cumsum(bar) 
    

    The result is 5, because it is only coercing to numeric Data

    Moreover:

     setClass("Foo2", representation(.Data="numeric", number1 = "numeric", number2 = "numeric"))
    
     bar2 <- new("Foo2", number1 = 12, number2 = 31)
    
     cumsum(bar2) 
    

    Give you back numeric(0) because there is no .Data

    And

     setClass("Foo3", representation( number1 = "numeric", number2 = "numeric"))
    
     bar3 <- new("Foo3", number1 = 12, number2 = 31)
    
     cumsum(bar3) 
    

    This is not working at all: without .Data, internally, R does not know how to coerce it to numeric when doing cumsum

    So: it is because of how R internally works with complex S4 objects. You can always tell the lubridate people to create a new parameter seconds and store in .Data the cumulative seconds of the whole S4 object. I guess this way cumsum will work. But right now, the are using .Data to store the second argument. See edit(seconds_to_period):

    function (x) 
    {
      span <- as.double(x)
      remainder <- abs(span)
      newper <- period(second = rep(0, length(x)))
      slot(newper, "day") <- remainder%/%(3600 * 24)
      remainder <- remainder%%(3600 * 24)
      slot(newper, "hour") <- remainder%/%(3600)
      remainder <- remainder%%(3600)
      slot(newper, "minute") <- remainder%/%(60)
      slot(newper, ".Data") <- remainder%%(60)
      newper * sign(span)
    }
    

    Finally, just for fun. This is my mock version of how to make cumsum work here:

    setClass("Period2",representation(.Data="numeric", period="Period"))
    
    
    seconds_to_period_2 <- function(x){
       (lapply(x, function(y) new("Period2", .Data=y, period=seconds_to_period(y))))
    }
    
    a<-seconds_to_period_2(1:60)
    
    cumsum(a)
    

    Best!