Search code examples
rcorrelationdimensions

Incompatible dimensions error in cor function


I am trying to write a function that calculates the correlation between my dependent variable and the independent variables so that I can find the perfect lag for my data before I put it into a regression model. I want to calculate the optimal lag for each quarter(I mean each quarter of the years I am inspecting) of my training data and then average the resulting correlations for each of the lags. A lag of 1 means that my dependent variable in t is best described by my variables in t-1. One could think of this as doing crossvalidation basically. My code is as follows:

Korrelations.Maximierer = function(Aktie, Kategorie){

  Ergebnis = matrix(nrow = 114,ncol =  18)

  for(tmp in 1:18){
    Start.Test=1*tmp
    Ende.Test=13*tmp
    Untersuchungszeitraum = Aktie[Start.Test:Ende.Test]

    for(i in 0:113){
      int.low=96-i+18*tmp
      int.high=108-i+18*tmp

      Ergebnis[i+1,tmp]=mean(abs(cor(Untersuchungszeitraum,Kategorie[int.low:int.high,-1],
      method = "spearman")))
    }
  }  
  return(Ergebnis)
}

My data is measured in weeks thats why each quarter consists of 13 data points. Furthermore I am inspecting 4,5 Years of data hence 18 quarters. I have got data for my independent variables up to 113 weeks before the first data point of the dependent variable. When I run this I get following error message:

Error in cor(Untersuchungszeitraum, Kategorie[int.low:int.high, -1],
method = "spearman") : 
incompatible dimensions In addition: There were 50 or more warnings
(use   warnings() to see the first 50) 

Typing in 'warnings()' tells me that the standart deviation is zero, what also irritates me.

I run the code for the first sample by hand and both "Untersuchungszeitraum" and "Kategorie[int.low:int.high]" have the same number of rows and hence the correlation should be computeable.

After setting my x and y manually, basically just copy pasting the code from my skript and setting tmp = 1 and i = 0 by hand, so omitting the for loops. I tried calculating the correlation for the resulting dataframes and got the result I was looking for. Plus the standard deviation is zero error.

I do not understand why this works when I typ it in by hand and not when I use the skript. Also some insight in the standard deviation is zero error would be nice. Thank you for any help!


Solution

  • I think that the problem might be that Start.Test and Ende.Test are not calculated correctly. Kategorie[int.low:int.high,-1] will always be a vector of length 12 (int.high - int.low = 12 for all values of i and tmp), but the length of Untersuchungszeitraum will be Ende.Test - Start.Test = 12 * tmp. This means that the vectors will have the same length for the first iteration in the for-loop, but not after that.

    I don't really understand what the code is supposed to do, but one possibility is that you meant to do

    Start.test = 1 + tmp
    Ende.test = 13 + tmp