Search code examples
rdata.tableweighted-average

Calculate weighted average life in R


I would like to calculate the weighted average life (WAL) of a loan over time in R. The formula to calculate the WAL is given here.

I have the following sample data created in R.

Sample data

library(data.table)
DT<-data.table(date=c(rep(seq(from = 2015, to = 2016.25,by = .25),2),
seq(from = 2015, to = 2017.5,by = .5)),
           value=c(rep(100,5), 0, 100, 80, 60, 40, 20, 0, 100, 70, 40, 30, 20, 0),
           id=rep(c("a","b","c"),each=6))

DT

       date value id
 1: 2015.00   100  a
 2: 2015.25   100  a
 3: 2015.50   100  a
 4: 2015.75   100  a
 5: 2016.00   100  a
 6: 2016.25     0  a
 7: 2015.00   100  b
 8: 2015.25    80  b
 9: 2015.50    60  b
 10: 2015.75    40  b
 11: 2016.00    20  b
 12: 2016.25     0  b
 13: 2015.00   100  c
 14: 2015.50    70  c
 15: 2016.00    40  c
 16: 2016.50    30  c
 17: 2017.00    20  c
 18: 2017.50     0  c

Thus every loan in this example has a maturity of 5 years and at maturity date the loan is completely amortized. Note: The dates are not always incremented by one semi year or one quarter, but are may differ (see sample data).

To calculate the the WAL I have created the following R code

Counter <- unique(DT$id)

# LOOP OVER ID
for (i in 1:length(Counter)) {

# SUBSET ONE ID
DTSub <- DT[id == Counter[i], ]

# LOOP OVER THE AMORTIZATIONDATES
CounterSub <- unique(DTSub$date)

for (j in 1:length(CounterSub)) {

# SUBSET RANGE OF DATES IN COUNTERSUB
DTSub_Date <- DTSub[date >= CounterSub[j], ]
DTSub_Date[, t := abs(min(date)-date)]
DT[id == Counter[i] & date == CounterSub[j], 
       wal_calc := round(sum(abs(diff(DTSub_Date$value)) 
       / max(DTSub_Date$value) * DTSub_Date$t[2:nrow(DTSub_Date)]),3)]

}
}

The output of the code

DT

       date value id wal_calc
 1: 2015.00   100  a    1.250
 2: 2015.25   100  a    1.000
 3: 2015.50   100  a    0.750
 4: 2015.75   100  a    0.500
 5: 2016.00   100  a    0.250
 6: 2016.25     0  a    0.000
 7: 2015.00   100  b    0.750
 8: 2015.25    80  b    0.625
 9: 2015.50    60  b    0.500
 10: 2015.75    40  b    0.375
 11: 2016.00    20  b    0.250
 12: 2016.25     0  b    0.000
 13: 2015.00   100  c    1.300
 14: 2015.50    70  c    1.143
 15: 2016.00    40  c    1.125
 16: 2016.50    30  c    0.833
 17: 2017.00    20  c    0.500
 18: 2017.50     0  c    0.000

The output of the code is correct (wal_calc) but uses a double for loop, and hence is slow on relatively large datasets (mine has 77k rows and 200 columns).

The first for loop subsets the IDs and the second subsets future dates (by id, based on the first subset).

Request

I would like to be able to generate WALS on this sample data in way faster and more efficient manner and avoid this double for loop. There might be a very simple solution to this problem.

If anything is unclear please let me know.


Solution

  • This will do it without for loops.

    DT[order(date), WAL := {
      pmts <- matrix(value[-.N] - value[-1L], 
                     nrow = n2 <- .N - 1L, ncol = n2)
      ts <- matrix(date[-1L] - date[-.N], nrow = n2, ncol = n2)
      ts[upper.tri(ts)] <- 0
      ts <- apply(ts, 2, cumsum)
      c(colSums(pmts * ts) / value[-.N], 0)}, by = id]
    DT
         date value id       WAL
    # 1: 2015.00   100  a 1.2500000
    # 2: 2015.25   100  a 1.0000000
    # 3: 2015.50   100  a 0.7500000
    # 4: 2015.75   100  a 0.5000000
    # 5: 2016.00   100  a 0.2500000
    # 6: 2016.25     0  a 0.0000000
    # 7: 2015.00   100  b 0.7500000
    # 8: 2015.25    80  b 0.6250000
    # 9: 2015.50    60  b 0.5000000
    # 10: 2015.75    40  b 0.3750000
    # 11: 2016.00    20  b 0.2500000
    # 12: 2016.25     0  b 0.0000000
    # 13: 2015.00   100  c 1.3000000
    # 14: 2015.50    70  c 1.1428571
    # 15: 2016.00    40  c 1.1250000
    # 16: 2016.50    30  c 0.8333333
    # 17: 2017.00    20  c 0.5000000
    # 18: 2017.50     0  c 0.0000000