Search code examples
rrecursionmemoizationmemoise

Memoise on recursive function


Introduction

I have a function taking a date as input, doing some calculation taking a certain time - represent by Sys.sleep() - removing all '-' in the date and returning a character:

library(maggritr)

auxialiaryCompute = function(vDate)
{
    Sys.sleep(1)
    vDate %>% as.character %>% gsub("-", "", .)
}

> auxialiaryCompute(as.Date("2015-01-14"))
[1] "20150114"

Cool. The output of the above is '20150114'. Now I would like to include the previous output in this function. Or the two previous days, or .. the n previous outputs until a limited day in the past called loopBackMaxDate.

Rough recursion

Here is one possible recursive code:

compute = function(vDate, loopBackMaxDate=vDate, loopBackDays=0)
{
    d = as.Date # short alias

    dates = Filter(function(x) x>d(loopBackMaxDate), 
                   getPreviousDates(loopBackDays, d(vDate))) 

    if(length(dates)==0)
        return(auxialiaryCompute(vDate=vDate, previousOutputs=list()))

    previousOutputs = lapply(dates, function(u) compute(u, loopBackMaxDate, loopBackDays))

    auxialiaryCompute(vDate=vDate, previousOutputs=previousOutputs)
}

auxialiaryCompute = function(vDate, previousOutputs=list())
{
    Sys.sleep(1)
    vDate %>% as.character %>% gsub("-", "", .)
}

getPreviousDates = function(loopBackDays, vDate)
{
    if(loopBackDays==0) return()
    seq.Date(from=vDate-loopBackDays, to=vDate-1, by="days")
}

With this, I have the same result as before (taking 1 sec in average):

> compute(as.Date("2015-01-14"))
[1] "20150114"

And the following takes effectively 4 seconds:

> system.time(compute("2014-05-05", loopBackMaxDate="2014-05-01", loopBackDays=1))
   user  system elapsed 
   0.00    0.00    3.99 

I want to compute the follwoing, it takes 3 seconds:

> system.time(compute("2014-05-04", loopBackMaxDate="2014-05-01", loopBackDays=1))
   user  system elapsed 
   0.02    0.00    3.01 

This is very bad, because I am computing again the results for vDate="2014-05-04", vDate="2014-05-03" and vDate="2014-05-02" whereas it has been done when calling compute("2014-05-05", loopBackMaxDate="2014-05-01", loopBackDays=1)...

Memoized recursion

Here is how I went through with memoized:

library(memoise)

compute = memoise(function(vDate, loopBackMaxDate=vDate, loopBackDays=0)
{
    d = as.Date # short alias

    dates = Filter(function(x) x>d(loopBackMaxDate), getPreviousDates(loopBackDays, d(vDate))) 

    if(length(dates)==0)
        return(auxialiaryCompute(vDate=vDate, previousOutputs=list()))

    previousOutputs = lapply(dates, function(u) compute(u, loopBackMaxDate, loopBackDays))

    auxialiaryCompute(vDate=vDate, previousOutputs=previousOutputs)
})

auxialiaryCompute = memoise(function(vDate, previousOutputs=list())
{
    Sys.sleep(1)
    vDate %>% as.character %>% gsub("-", "", .)
})

First run (takes effectively 4 seconds):

> system.time(compute("2014-05-05", loopBackMaxDate="2014-05-01", loopBackDays=1))
  user  system elapsed 
  0.00    0.00    4.01 

Second run takes 1 seconds whereas I expected it to take 0 seconds:

> system.time(compute("2014-05-04", loopBackMaxDate="2014-05-01", loopBackDays=1))
   user  system elapsed 
   0.00    0.00    0.99 

I think I am completetly wrong somewhere ... I could store outputs in a global variable, but I really would like to make it working with memoization or continuous style passing and avoid redundant computation!

If anyone has an idea, I would be very grateful!


Solution

  • Ok, so first, I put some loginfo on the auxiliaryCompute function:

    compute = memoise(function(vDate, loopBackMaxDate=vDate, loopBackDays=0)
    {
        d = as.Date # short alias
    
        dates = Filter(function(x) x>d(loopBackMaxDate), getPreviousDates(loopBackDays, d(vDate))) 
    
        if(length(dates)==0)
        {
            loginfo("I reached the tail!")
            return(auxiliaryCompute(vDate=vDate, previousOutputs=0))
        }
    
        previousOutputs = lapply(dates, function(u){
                        compute(vDate=u, loopBackMaxDate=loopBackMaxDate, loopBackDays)
                      })
    
        auxiliaryCompute(vDate2=vDate, previousOutputs=previousOutputs)
    })
    
    auxiliaryCompute = memoise(function(vDate2, previousOutputs)
    {
        loginfo("-------arguments in auxiliaryCompute are: vDate %s , previousOutputs %s", vDate2, unlist(previousOutputs))
    #   Sys.sleep(1)
        vDate2 %>% as.character %>% gsub("-", "", .)
    })
    
    > compute("2015-01-10", "2015-01-01", 2)
    2015-01-20 18:53:12 INFO::I reached the tail!
    2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-02 , previousOutputs 0
    2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-03 , previousOutputs 20150102
    2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-04 , previousOutputs 20150102,20150103
    2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-05 , previousOutputs 20150103,20150104
    2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-06 , previousOutputs 20150104,20150105
    2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-07 , previousOutputs 20150105,20150106
    2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-08 , previousOutputs 20150106,20150107
    2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-09 , previousOutputs 20150107,20150108
    2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-10 , previousOutputs 20150108,20150109
    [1] "20150110"
    
    > compute("2015-01-08", "2015-01-01", 2)
    2015-01-20 18:54:11 INFO::-------arguments: vDate 2015-01-08 , previousOutputs 20150106,20150107
    [1] "20150108"
    

    The first log is good, we go each time only once every each date (not repetition with memoize). However it's weird that in the second log the function auxiliaryCompute is called with arguments vDate 2015-01-08 , previousOutputs 20150106,20150107 since it has already been executed (appearedin first log).

    And other dates are correclty memorized .... only the first bugs ... which is because it's a string and other date in recursion are coerced to Date format.

    By putting just dates in the parameters it works:

    > compute(as.Date("2015-01-08"), "2015-01-01", 2)
    [1] "20150108"
    

    It's really sneaky because R is not a strongly typed langage, and mostly because I coded very badly by "confusing" dates and strings!