Search code examples
rforeachquantmodparallel.foreachdoparallel

Exporting a Parallel Foreach Call Into an Enviornment


I would like to export the out puts from foreach into an environment. I am pulling time series data from yahoo finance.

library(quantmod)
library(foreach)
library(parallel)
library(doParallel)
registerDoParallel(cores=2)

hub = new.env()
tickers = c("NKE", "AAPL", "MSFT", "TSLA", "MPC", "PEP", "GIS", "MA","V", "CAT", "KHC", "AMZN", "NFLX", "GS", "MS", "BAC", "GE", "KO", "JPM", "AMAT", "ABT", "BIIB")

#I have tried 2 methods below.
#The first gives me a list of just the ticker names.
#The second puts the data into a list. I am looking for an enviornment
foreach(r = tickers, .packages = "quantmod") %dopar% lapply(r, getSymbols, env = hub)

enviro = foreach(r = tickers, .packages = "quantmod")%dopar% lapply(r, getSymbols, auto.assign = F)

class(enviro)
[1] "list"

The enviornment should look like this(it works when i do not run it in a foreach loop).

hub = new.env()
#the following line of code takes about 1 min. Just a heads up
getSymbols(tickers, env = hub)

Solution

  • The question is very unclear, but from your question it seems you are trying to combine the outputs into a single environment, to obtain a greater speed.

    Now you should likely note a few things. quantmod::getSymbols has quite some overhead to each call. With your current method, you should see a loss of performance, due to the function being called for every symbol.

    Optimizations

    One method to reduce overhead, is to split each calculation into chunks. The foreach package relies on the iterators package, which allows one to split up the computations into chunks, quite simply.

    nworker = 2
    registerDoParallel(cores = nworker)
    tickers = matrix(c("NKE", "AAPL", "MSFT", "TSLA", "MPC", "PEP", "GIS", "MA","V", "CAT", "KHC", "AMZN", "NFLX", "GS", "MS", "BAC", "GE", "KO", "JPM", "AMAT", "ABT", "BIIB"), ncol = 1)
    tickerIter <- iterators::iter(tickers, by = 'row', #I made a 1 column matrix, so i will iterate over each row.
                                  chunksize = ceiling(length(tickers)/nworker) #Set chunk size, such that each worker gets 1 job.
    )
    

    In the code above, tickerIter is now an iterator over all the the symbols, with a chunk length of nworker. Thus each worker (core) only gets a single chunk, and we will only have to export to and import from each worker once. tickerIter will be given as our argument in the foreach loop instead of the raw tickers. To see how the iterator outputs to the foreach loop, you could try executing nextElem(tickerIter) which will output one chunk. Note however, that you will need to re-assign the iterator, as the chunk will then not be assigned in the foreach loop if it has already been output using nextElem.

    Combining into environment

    From the question you want to combine the output into a single environment. Doing this directly within the foreach is simply not possible, at least not without the danger of crashing the R session. Foreach by defaults performs parallization by creating multiple R sessions, exporting data and executing the code/expression provided. Thus you would have to hook into the current R session, and assign the variables to the environment through this hook. This is not recommended.

    But foreach contains a .combine argument, which can be given a custom function to combine. Additionally if the function is made to combine any number of inputs, using the .multicombine = TRUE the function will only be executed once with every output.

    I don't see why you would want to specifically add them into the hub environment, as such in the code example below, the output is instead combined into a single list. The list could then be converted using list2env to export output into a specific environment.

    Note the use of tickerIter instead of the raw tickers.

    output <- foreach(r = tickerIter, 
                      .combine = function(...){
                        c(...) #Combine all outputs into a list
                      }, 
                      .multicombine = TRUE, 
                      .packages = "quantmod")%dopar% {
                        currenv <- environment() 
                        getSymbols(r, currenv)
                        as.list(currenv)
                      }
    #If you really want it in a specific environment, you could use: (Could also be used in .combine)
    list2env(output, hub)