I am trying to train multiple time series models using the below code in Jupyter Notebook.
import statsmodels.api as sm
import multiprocessing
import tqdm
train_dict = dict() # A dictionary of dataframes
test_dict = dict() # A dictionary of dataframes
def train_arma(key):
endog = list(train_dict[key].endog)
exog = list(train_dict[key].exog)
fut_endog = list(train_dict[key].endog)
fut_exog = list(test_dict[key].exog)
model = sm.tsa.arima.ARIMA(endog, order=(2, 0, 2), exog=exog,
enforce_stationarity=False,
enforce_invertibility=False).fit()
predictions = list()
yhat = model.forecast(exog=[fut_exog[0]])[0]
predictions.append(yhat)
for i in tqdm.tqdm_notebook(range(len(fut_vol))[:-1]):
model = model.append([fut_vol[i]], exog=[fut_exog[i]], refit=True) #code gets stuck here
predictions.append(model.forecast(exog=[fut_exog[i+1]])
return predictions
secs = list(train_dict.keys())
p = multiprocessing.Pool(10)
output = p.map(train_arma, secs)
p.terminate()
When len(endog) == 1006
, the code keeps getting stuck on the 17th iteration in the for loop. If I decrease the endog by 20, then it gets stuck on 37th iteration.
There are some other things I have tried already:
I did a top
in my linux terminal and the observed the cpu usage while processes were getting created and executed. Initially when the processes spawn, they use up cpu and when the processes gets stuck %CPU allocation becomes 0.
There are some instances when the code does work:
processes = 1
makes the code stop.I am using statsmodels v0.12.1 and python version is 3.7.3. Thanks
This issue must be due to usage of tqdm alongside multiprocessing.
https://github.com/tqdm/tqdm/issues/461 addresses this issue.
I resolved it by using
from tqdm import tqdm
tqdm.get_lock().locks = []