r concurrency parallel-processing multicore

Updating the same memory (matrix) on parallel computations?

I have a strong use case for parallelizing a flavor of the SGD algorithm. In such use-case I need to update the matrices P and Q with the delta gradient update and for a random batch of samples. Each process will update mutually exclusive indices on both matrices.

A simple illustration of what I intend to do would be something like this:

# create "big" matrix
A <- matrix(rnorm(10000), 100, 100)
system.time(
  # update each row vector independently using all my cores
  r <- mclapply(1:100, mc.cores = 6, function(i) {
    # updating ... 
    A[i,] <- A[i,] - 0.01
    # return something, i.e. here I'd return the RMSE of this batch instead   
    sqrt(sum(A[i,]^2))
  }) 
)

Are there any drawbacks on using this approach? are there more R-idiomatic alternatives?

For example, to be clean (i.e. no side effects, immutable computation) returning the update A[i,] - 0.01 instead of the RMSE would be more complex to program and peak on memory usage or even run out of memory.

Solution

Reimplementing your code, by block, using shared data with package {bigstatsr}:

N <- 10e3
A <- matrix(rnorm(N * N), N)

library(bigstatsr)
bigA <- as_FBM(A)

library(doParallel)
registerDoParallel(cl <- makeCluster(4))
system.time(
  r <- foreach(i = seq_len(N), .combine = 'c') %dopar% {
    # updating ... 
    A[i,] <- A[i,] - 0.01
    # return something, i.e. here I'd return the RMSE of this batch instead   
    sqrt(sum(A[i,]^2))
  }
) # 11 sec
stopCluster(cl)

registerDoParallel(cl <- makeCluster(4))
system.time(
  r2 <- big_apply(bigA, function(X, ind) {
    # updating ... 
    tmp <- bigA[ind, ] <- bigA[ind, ] - 0.01
    # return something, i.e. here I'd return the RMSE of this batch instead   
    sqrt(rowSums(tmp^2))
  }, a.combine = 'c')
) # 1 sec
stopCluster(cl)

all.equal(r, r2) # TRUE

Again, it would be better to update columns instead of rows.