I am trying to write a function that calculates the correlation between my dependent variable and the independent variables so that I can find the perfect lag for my data before I put it into a regression model. I want to calculate the optimal lag for each quarter(I mean each quarter of the years I am inspecting) of my training data and then average the resulting correlations for each of the lags. A lag of 1 means that my dependent variable in t is best described by my variables in t-1. One could think of this as doing crossvalidation basically. My code is as follows:
Korrelations.Maximierer = function(Aktie, Kategorie){
Ergebnis = matrix(nrow = 114,ncol = 18)
for(tmp in 1:18){
Start.Test=1*tmp
Ende.Test=13*tmp
Untersuchungszeitraum = Aktie[Start.Test:Ende.Test]
for(i in 0:113){
int.low=96-i+18*tmp
int.high=108-i+18*tmp
Ergebnis[i+1,tmp]=mean(abs(cor(Untersuchungszeitraum,Kategorie[int.low:int.high,-1],
method = "spearman")))
}
}
return(Ergebnis)
}
My data is measured in weeks thats why each quarter consists of 13 data points. Furthermore I am inspecting 4,5 Years of data hence 18 quarters. I have got data for my independent variables up to 113 weeks before the first data point of the dependent variable. When I run this I get following error message:
Error in cor(Untersuchungszeitraum, Kategorie[int.low:int.high, -1],
method = "spearman") :
incompatible dimensions In addition: There were 50 or more warnings
(use warnings() to see the first 50)
Typing in 'warnings()' tells me that the standart deviation is zero, what also irritates me.
I run the code for the first sample by hand and both "Untersuchungszeitraum" and "Kategorie[int.low:int.high]" have the same number of rows and hence the correlation should be computeable.
After setting my x and y manually, basically just copy pasting the code from my skript and setting tmp = 1 and i = 0 by hand, so omitting the for loops. I tried calculating the correlation for the resulting dataframes and got the result I was looking for. Plus the standard deviation is zero error.
I do not understand why this works when I typ it in by hand and not when I use the skript. Also some insight in the standard deviation is zero error would be nice. Thank you for any help!
I think that the problem might be that Start.Test
and Ende.Test
are not calculated correctly. Kategorie[int.low:int.high,-1]
will always be a vector of length 12 (int.high - int.low = 12
for all values of i
and tmp
), but the length of Untersuchungszeitraum
will be Ende.Test - Start.Test = 12 * tmp
. This means that the vectors will have the same length for the first iteration in the for
-loop, but not after that.
I don't really understand what the code is supposed to do, but one possibility is that you meant to do
Start.test = 1 + tmp
Ende.test = 13 + tmp