I would like to export the out puts from foreach into an environment. I am pulling time series data from yahoo finance.
library(quantmod)
library(foreach)
library(parallel)
library(doParallel)
registerDoParallel(cores=2)
hub = new.env()
tickers = c("NKE", "AAPL", "MSFT", "TSLA", "MPC", "PEP", "GIS", "MA","V", "CAT", "KHC", "AMZN", "NFLX", "GS", "MS", "BAC", "GE", "KO", "JPM", "AMAT", "ABT", "BIIB")
#I have tried 2 methods below.
#The first gives me a list of just the ticker names.
#The second puts the data into a list. I am looking for an enviornment
foreach(r = tickers, .packages = "quantmod") %dopar% lapply(r, getSymbols, env = hub)
enviro = foreach(r = tickers, .packages = "quantmod")%dopar% lapply(r, getSymbols, auto.assign = F)
class(enviro)
[1] "list"
The enviornment should look like this(it works when i do not run it in a foreach loop).
hub = new.env()
#the following line of code takes about 1 min. Just a heads up
getSymbols(tickers, env = hub)
The question is very unclear, but from your question it seems you are trying to combine the outputs into a single environment, to obtain a greater speed.
Now you should likely note a few things. quantmod::getSymbols
has quite some overhead to each call. With your current method, you should see a loss of performance, due to the function being called for every symbol.
One method to reduce overhead, is to split each calculation into chunks. The foreach
package relies on the iterators
package, which allows one to split up the computations into chunks, quite simply.
nworker = 2
registerDoParallel(cores = nworker)
tickers = matrix(c("NKE", "AAPL", "MSFT", "TSLA", "MPC", "PEP", "GIS", "MA","V", "CAT", "KHC", "AMZN", "NFLX", "GS", "MS", "BAC", "GE", "KO", "JPM", "AMAT", "ABT", "BIIB"), ncol = 1)
tickerIter <- iterators::iter(tickers, by = 'row', #I made a 1 column matrix, so i will iterate over each row.
chunksize = ceiling(length(tickers)/nworker) #Set chunk size, such that each worker gets 1 job.
)
In the code above, tickerIter
is now an iterator over all the the symbols, with a chunk length of nworker
. Thus each worker (core) only gets a single chunk, and we will only have to export to and import from each worker once. tickerIter
will be given as our argument in the foreach
loop instead of the raw tickers.
To see how the iterator outputs to the foreach
loop, you could try executing nextElem(tickerIter)
which will output one chunk. Note however, that you will need to re-assign the iterator, as the chunk will then not be assigned in the foreach
loop if it has already been output using nextElem
.
From the question you want to combine the output into a single environment. Doing this directly within the foreach
is simply not possible, at least not without the danger of crashing the R session. Foreach by defaults performs parallization by creating multiple R sessions, exporting data and executing the code/expression provided. Thus you would have to hook into the current R session, and assign the variables to the environment through this hook. This is not recommended.
But foreach
contains a .combine
argument, which can be given a custom function to combine. Additionally if the function is made to combine any number of inputs, using the .multicombine = TRUE
the function will only be executed once with every output.
I don't see why you would want to specifically add them into the hub environment, as such in the code example below, the output is instead combined into a single list. The list could then be converted using list2env
to export output into a specific environment.
Note the use of tickerIter
instead of the raw tickers.
output <- foreach(r = tickerIter,
.combine = function(...){
c(...) #Combine all outputs into a list
},
.multicombine = TRUE,
.packages = "quantmod")%dopar% {
currenv <- environment()
getSymbols(r, currenv)
as.list(currenv)
}
#If you really want it in a specific environment, you could use: (Could also be used in .combine)
list2env(output, hub)