Search code examples
rdata.tablefinancequantitative-finance

How to calculate return in data.table?


I am new on stack overflow and an R beginner.

I want to calculate the returns of a big data set which looks like this:

Date        C1  C2  C3
31.01.1985  NA  47  NA
28.02.1985  NA  45  NA
29.03.1985  130 56  NA
30.04.1985  140 67  NA
31.05.1985  150 48  93
28.06.1985  160 79  96
31.07.1985  160 56  94
30.08.1985  160 77  93
30.09.1985  160 66  93
31.10.1985  160 44  93
29.11.1985  160 55  93

It's a data.table format, let's say it's called Prices, columns are the companies, values are the prices, the real data set has many more columns and rows. I want to build a new DT where I calculate the monthly returns, I know you can do this with the diff() function. but how do I build my new Data table with so many columns without for loops?

I thought of:

Returns <- diff(Prices[, names(Prices) != "Date"])

but this for some reason only gives out:

[1] 1 0 0

Thanks in advance.


Solution

  • The reason you are getting that output is because Prices[, names(Prices) != "Date"] returns a logical vector:

    > Prices[, names(Prices) != "Date"]
    [1] FALSE  TRUE  TRUE  TRUE
    

    And because you can do calculations with logicals, you can also use diff on a logical vector. FALSE is then treated as a 0 and TRUE as a 1. So basically you were doing diff(c(0,1,1,1)).


    A possible solution for what you want:

    cols <- setdiff(names(Prices),"Date")
    
    # option 1:
    Prices[, paste0(cols,"_return") := lapply(.SD, function(x) (x - shift(x, fill = NA))/shift(x, fill = NA)), .SDcols = cols][]
    
    # option 2:
    Prices[, paste0(cols,"_return") := lapply(.SD, function(x) c(NA,diff(x))/shift(x, fill = NA)), .SDcols = cols][]
    

    which gives:

    > Prices
              Date  C1 C2 C3  C1_return   C2_return   C3_return
     1: 1985-01-31  NA 47 NA         NA          NA          NA
     2: 1985-02-28  NA 45 NA         NA -0.04255319          NA
     3: 1985-03-29 130 56 NA         NA  0.24444444          NA
     4: 1985-04-30 140 67 NA 0.07692308  0.19642857          NA
     5: 1985-05-31 150 48 93 0.07142857 -0.28358209          NA
     6: 1985-06-28 160 79 96 0.06666667  0.64583333  0.03225806
     7: 1985-07-31 160 56 94 0.00000000 -0.29113924 -0.02083333
     8: 1985-08-30 160 77 93 0.00000000  0.37500000 -0.01063830
     9: 1985-09-30 160 66 93 0.00000000 -0.14285714  0.00000000
    10: 1985-10-31 160 44 93 0.00000000 -0.33333333  0.00000000
    11: 1985-11-29 160 55 93 0.00000000  0.25000000  0.00000000
    

    If you want to create a new data.table, you could use one of the following two options:

    # option 1:
    Returns <- Prices[, c(list(Date = Date), lapply(.SD, function(x) (x - shift(x, fill = NA))/shift(x, fill = NA))), .SDcols = cols]
    
    # option 2:
    Returns <- copy(Prices)
    Returns[, (cols) := lapply(.SD, function(x) (x - shift(x, fill = NA))/shift(x, fill = NA)), .SDcols = cols]
    

    Used data:

    Prices <- fread("Date        C1  C2  C3
    31.01.1985  NA  47  NA
    28.02.1985  NA  45  NA
    29.03.1985  130 56  NA
    30.04.1985  140 67  NA
    31.05.1985  150 48  93
    28.06.1985  160 79  96
    31.07.1985  160 56  94
    30.08.1985  160 77  93
    30.09.1985  160 66  93
    31.10.1985  160 44  93
    29.11.1985  160 55  93")[, Date := as.Date(Date, "%d.%m.%Y")]