I am new on stack overflow and an R beginner.
I want to calculate the returns of a big data set which looks like this:
Date C1 C2 C3
31.01.1985 NA 47 NA
28.02.1985 NA 45 NA
29.03.1985 130 56 NA
30.04.1985 140 67 NA
31.05.1985 150 48 93
28.06.1985 160 79 96
31.07.1985 160 56 94
30.08.1985 160 77 93
30.09.1985 160 66 93
31.10.1985 160 44 93
29.11.1985 160 55 93
It's a data.table
format, let's say it's called Prices, columns are the companies, values are the prices, the real data set has many more columns and rows. I want to build a new DT where I calculate the monthly returns, I know you can do this with the diff()
function. but how do I build my new Data table with so many columns without for loops?
I thought of:
Returns <- diff(Prices[, names(Prices) != "Date"])
but this for some reason only gives out:
[1] 1 0 0
Thanks in advance.
The reason you are getting that output is because Prices[, names(Prices) != "Date"]
returns a logical vector:
> Prices[, names(Prices) != "Date"]
[1] FALSE TRUE TRUE TRUE
And because you can do calculations with logicals, you can also use diff
on a logical vector. FALSE
is then treated as a 0
and TRUE
as a 1
. So basically you were doing diff(c(0,1,1,1))
.
A possible solution for what you want:
cols <- setdiff(names(Prices),"Date")
# option 1:
Prices[, paste0(cols,"_return") := lapply(.SD, function(x) (x - shift(x, fill = NA))/shift(x, fill = NA)), .SDcols = cols][]
# option 2:
Prices[, paste0(cols,"_return") := lapply(.SD, function(x) c(NA,diff(x))/shift(x, fill = NA)), .SDcols = cols][]
which gives:
> Prices Date C1 C2 C3 C1_return C2_return C3_return 1: 1985-01-31 NA 47 NA NA NA NA 2: 1985-02-28 NA 45 NA NA -0.04255319 NA 3: 1985-03-29 130 56 NA NA 0.24444444 NA 4: 1985-04-30 140 67 NA 0.07692308 0.19642857 NA 5: 1985-05-31 150 48 93 0.07142857 -0.28358209 NA 6: 1985-06-28 160 79 96 0.06666667 0.64583333 0.03225806 7: 1985-07-31 160 56 94 0.00000000 -0.29113924 -0.02083333 8: 1985-08-30 160 77 93 0.00000000 0.37500000 -0.01063830 9: 1985-09-30 160 66 93 0.00000000 -0.14285714 0.00000000 10: 1985-10-31 160 44 93 0.00000000 -0.33333333 0.00000000 11: 1985-11-29 160 55 93 0.00000000 0.25000000 0.00000000
If you want to create a new data.table
, you could use one of the following two options:
# option 1:
Returns <- Prices[, c(list(Date = Date), lapply(.SD, function(x) (x - shift(x, fill = NA))/shift(x, fill = NA))), .SDcols = cols]
# option 2:
Returns <- copy(Prices)
Returns[, (cols) := lapply(.SD, function(x) (x - shift(x, fill = NA))/shift(x, fill = NA)), .SDcols = cols]
Used data:
Prices <- fread("Date C1 C2 C3
31.01.1985 NA 47 NA
28.02.1985 NA 45 NA
29.03.1985 130 56 NA
30.04.1985 140 67 NA
31.05.1985 150 48 93
28.06.1985 160 79 96
31.07.1985 160 56 94
30.08.1985 160 77 93
30.09.1985 160 66 93
31.10.1985 160 44 93
29.11.1985 160 55 93")[, Date := as.Date(Date, "%d.%m.%Y")]