Search code examples
rparallel-processingrecurrencemap-functioncpu-cores

Multiple Core Usage/Parallelize Map Function for an RQA


I am currently trying to run a windowed CRQA in R using heart and respiration waveform signals. I have to run 94 windowed CRQA, where each signal has 20000 - 50000 data points. Hence the computational load is relatively high and takes forever. Therefore I am trying to get R to increase the memory size and use multiple cores. Both this does not seem to work with the folllowing code:

library(parallel)
library(doParallel) 
library(crqa)
gc()
# Set Memory Size (PC has 16000 RAM)
memory.limit(size = 17000)
memory.size(max=TRUE)

numCores <- detectCores() # core = 11

# Simulate Example Data
HR1 = c(arima.sim(list(order=c(1,0,2),ar=-0.6,ma=c(0.5,-0.7)),sd=sqrt(0.5),n=40000))
HR2 = c(arima.sim(list(order=c(1,0,2),ar=-0.7,ma=c(0.5,-0.7)),sd=sqrt(0.5),n=20000))
HR3 = c(arima.sim(list(order=c(1,0,2),ar=-0.8,ma=c(0.5,-0.7)),sd=sqrt(0.5),n=30000))

RR1 = c(arima.sim(list(order=c(1,0,2),ar=-0.4,ma=c(0.5,-0.7)),sd=sqrt(0.5),n=40000))
RR2 = c(arima.sim(list(order=c(1,0,2),ar=-0.3,ma=c(0.5,-0.7)),sd=sqrt(0.5),n=20000))
RR3 = c(arima.sim(list(order=c(1,0,2),ar=-0.2,ma=c(0.5,-0.7)),sd=sqrt(0.5),n=30000))

HR_list = list(HR1, HR2, HR3)
RR_list = list(RR1, RR2, RR3)

# Create Cluster
cl <- makeCluster(detectCores(), type='PSOCK')
registerDoParallel(cl)

# Run the Windowed CRQA
start_time <- Sys.time()
WCRQA_list = Map(function(x, y)
  wincrqa(ts1 = x, ts2 = y, windowstep = 1000, windowsize = 2000,
          radius = .2, delay = 4, embed = 2, rescale = 0, normalize = 0,
          mindiagline = 2, minvertline = 2, tw = 0, whiteline = F,
          side = "both", method = "crqa", metric = "euclidean", datatype = "continuous"),
  HR_list, RR_list)
end_time <- Sys.time()
end_time - start_time

registerDoSEQ()

Why does R not reclaim the memory size and cores it actually could? Checked it in the task manager and both does not seem to work. How could I solve this problem to actually run this code on my actual data set?

I am open to any help.

Best, Johnson


Solution

  • You need to use clusterMap() or another function from the parallel package rather than base::Map.

    Look those options up with: ?parallel::clusterMap.

    You are also currently mixing things up by introducing registerDoParallel(cl), which is from the foreach package. That would require you to then use foreach() and its helper %dopar%. If you do not use foreach() then you do not need registerDoParallel(cl).

    The relevant part of your code would look something like this with clusterMap(). I've tidied it a bit but I can't test it on my machine.

    my_wincrqa <-function(x, y) {
    
    crqa::wincrqa(ts1 = x, ts2 = y, windowstep = 1000, windowsize = 2000,
              radius = .2, delay = 4, embed = 2, rescale = 0, normalize = 0,
              mindiagline = 2, minvertline = 2, tw = 0, whiteline = F,
              side = "both", method = "crqa", metric = "euclidean", datatype = "continuous")
    
    }
    
    WCRQA_list = clusterMap(cl, my_wincrqa, HR_list, RR_list)
    
    

    Normally, it'd be best to test things first in a smaller scale. That's probably even more important for parallel computing and even more so considering how painful it is for parallel computing to work on R for Windows. Have a look at this self-answered question to set up a quick test. snow is now essentially parallel AFAIK.