r performance rcpp processing-efficiency memory-efficient

R: Efficient Way to partly modify diagonal of matrix

I have a square matrix with dimension ranging from 100x100 to 10,000x10,000. The matrix represents parameter values for a function. I go through a loop where I try various combinations of parameters. Every iteration W has a different value so I have to update the locations in the matrix which have the value W. This happens to be the even entries of the diagonal, so [2,2], [4,4], etc. I am curious if there is a more efficient way to do this than my current approach:

W<-treeDepth*newVar
for (iW in evenDiag) {
  matrixSource[iW,iW]<-W
}

So far I've only tested with 134x134 size matrices, but looping appears to be the fastest approach according to analysis with profvis.

When I try

diag(matrixSource)[evenDiag]<-W

It appears to take similar amounts of time, but it starts calling < GC> after an average of every 5 calls or so, but the actual times < GC> is called appear to be random.

I think < GC> is garbage collection, but whatever it is, it takes forever, and it's rarely if ever called when I have the loop version above.

Am I wrong to think there's a better approach than looping one by one. Would writing the loop in Rcpp make it faster? Hadley in "Advanced R" doesn't say that writing values into a matrix would be faster with Rcpp so it probably doesn't. If it did, how would I change that one little line(s) to Rcpp without making a C++ function or anything complicated (I don't know any C++).

According to my research it isn't possible to just write something like

matrixSource[evenDiag,evenDiag]<-W

but that if it were, R would excel at vectorized processing.

What is the best approach for this?

If it's helpful, the context is that the matrix needs to be fed into

negLogLik<- -mvtnorm::dmvnorm(flattenedData,sigma=matrixSource,log=T,checkSymmetry = FALSE)

and internally to that function it's fed into chol() (which sometimes calls < GC> in parallel)

So if there's a way I can modify that function to work with only partial matrices that aren't fully created so that maybe I only have to assign the value of W once, that'd also be good.

I also need a way to assign the diagonal directly above and below the main diagonal all the same value, Z. Is there a good way to do that?

Thank you <3

Solution

I have not found memory problems with square matrices up to dimension 1000.
@MichaelChirico's suggestion seems to be the most efficient. It runs in time constant with the matrix dimension, unlike diag<-.
The two methods used below are

# create an index
i <- rep(c(FALSE, TRUE), ncol(A) %/% 2)
j <- which(i)

# can also be diag(A)[i]
diag(A)[j] <- W

# cannot be cbind(i, i)
A[cbind(j, j)] <- 0

cbind is clearly more efficient.

library(microbenchmark)
library(ggplot2)

testFun <- function(N) {
  out <- lapply(seq.int(N)[-1L], \(n) {
    A <- matrix(1:n^2, n, n)
    W <- rep(0L, ncol(A) %/% 2L)
    i <- rep(c(FALSE, TRUE), ncol(A) %/% 2)
    j <- which(i)
    mb <- microbenchmark(
      diag = {diag(A)[i] <- W},
      cbind = {A[cbind(j, j)] <- W}
    )
    mb$dim <- n
    mb
  })
  out <- do.call(rbind, out)
  aggregate(time ~ ., out, median)
}

testData <- testFun(1000)
ggplot(testData, aes(dim, time, colour = expr)) +
  geom_line() +
  geom_point() +
  theme_bw()

^{Created on 2024-01-14 with reprex v2.0.2}

Edit

Another way of assigning the diagonal elements of a square matrix is proposed by user20650 in comment. Change the timimg instruction call above to

mb <- microbenchmark(
  diag = {diag(A)[i] <- W},
  cbind = {A[cbind(j, j)] <- W},
  sq = {A[j*j] <- W}
)

and the plot (without diag) becomes