Search code examples
rperformancematrixregressionsymmetric

R Speed up vectorize for square matrix


Anyone able to help me speed up some code:

n = seq_len(ncol(mat)) # seq 1 to ncol(mat)
sym.pr<-outer(n,n,Vectorize(function(a,b) {
    return(adf.test(LinReg(mat[,c(a,b)]),k=0,alternative="stationary")$p.value)
}))

Where mat is an NxM matrix of N observation and M objects, e.g:

    Obj1 Obj2 Obj3
1      .    .    .
2      .    .    .    
3      .    .    .

LinReg is defined as:

# Performs linear regression via OLS
LinReg=function(vals) {  
  # regression analysis
  # force intercept c at y=0
  regline<-lm(vals[,1]~as.matrix(vals[,2:ncol(vals)])+0)

  # return spread (residuals)
  return(as.matrix(regline$residuals))
}

Basically I am performing a regression analysis (OLS) on every combination of Objects (i.e. Obj1, Obj2 and Obj2,Obj3 and Obj1, Obj3) in mat, then using the adf.test function from the tseries package and storing the p-value. The end result sym.pr is a symmetric matrix of all p-values (but actually it's not 100% symmetric, see here for more info), nevertheless it will suffice.

With the above code, on a 600x300 matrix (600 observations and 300 objects), it takes about 15 minutes..

I thought of maybe only calculating the upper triangle of the symmetric matrix, but not sure how to go about doing it.

Any ideas?

Thanks.


Solution

  • Starting with some dummy data

    mdf <- data.frame( x1 = rnorm(5), x2 = rnorm(5), x3 = rnorm(5) )
    

    I would firstly determine the combinations of interest. So if I understood you right the result of your calculation should be the same for mdf[c(i,j)] and mdf[c(j,i)]. in this case you could use the combn function, to determine the relevant pairs.

    pairs <- as.data.frame( t( combn( colnames( mdf  ),2 ) ) )
    pairs
      V1 V2
    1 x1 x2
    2 x1 x3
    3 x2 x3
    

    Now you can just apply your function row-wise over the pairs (using a t.test here for simplicity):

    pairs[["p.value"]] <- apply( pairs, 1, function( i ){
      t.test( mdf[i] )[["p.value"]]
    })
    pairs
      V1 V2   p.value
    1 x1 x2 0.5943814
    2 x1 x3 0.7833293
    3 x2 x3 0.6760846
    

    If you still need your p.values back in (upper triangular) matrix form you can cast them:

    library(reshape2)
    acast( pairs, V1 ~ V2 )
              x2        x3
    x1 0.5943814 0.7833293
    x2        NA 0.6760846