I am using a Windows machine and I have a code designed for Python 2.7 that solves an statistical model. Since the model depends on the value of a parameter, I created a parallelized version that solves one model for each value of the parameter.
Consider for instance a first file called main_function
that includes the following code (this code is here for the sake of replicability but is not question-related):
import numpy as np
import cvxpy
def lm_lasso(x, y, lambda1=None):
n = x.shape[0]
m = x.shape[1]
lambda_param = cvxpy.Parameter(sign="positive")
# Define the objective function
beta_var = cvxpy.Variable(m)
lasso_penalization = lambda_param * cvxpy.norm(beta_var, 1)
lm_penalization = (1.0 / n) * cvxpy.sum_squares(y - x * beta_var)
objective = cvxpy.Minimize(lm_penalization + lasso_penalization)
problem = cvxpy.Problem(objective)
beta_sol_list = []
for l in lambda1:
lambda_param.value = l
problem.solve(solver=cvxpy.ECOS)
beta_sol = np.asarray(np.row_stack([b.value for b in beta_var])).flatten()
beta_sol_list.append(beta_sol)
return beta_sol_list
And a second file called parallel_function
that includes the following code:
import multiprocessing as mp
import numpy as np
import functools
import zz_main_function as mf
def lm_lasso_parallel(x, y, lambda1):
chunks = np.array_split(lambda1, mp.cpu_count())
pool = mp.Pool(processes=mp.cpu_count())
results = pool.map(functools.partial(mf.lm_lasso, x, y), chunks)
pool.close()
pool.join()
return results
The reason why I splitted the functions into two files is because this way everything seemed to work without adding the usual if __name__ == '__main__':
required when dealing with multiprocessing.
This code was written some months ago and worked perfectly either from the python console or by runnig a python file like:
import zz_parallel_function as pf
from sklearn.datasets import load_boston
boston = load_boston()
x = boston.data
y = boston.target
lambda1 = [0, 1e-3, 1e-2, 1e-1, 1, 1e2, 1e3]
r_parallel = pf.lm_lasso_parallel(x, y, lambda1)
Recently I had to format my computer and when I reinstalled python 2.7 and trried to run the code described before, I run into the following errors:
If I try to run it directly from python console:
import zz_parallel_function as pf
from sklearn.datasets import load_boston
boston = load_boston()
x = boston.data
y = boston.target
lambda1 = [0, 1e-3, 1e-2, 1e-1, 1, 1e2, 1e3]
r_parallel = pf.lm_lasso_parallel(x, y, lambda1)
So my question is:
Why did this code work before and not now? The only thing that (possibly) changed is the version of some of the modules installed but I dont think this is that relevant
Any guess on how to get it working again?
By adding if __name__ == '__main__':
to the code and running it as an independent file, it executes with no problem. However, when I try to execute it in a python console, it offers the same error as before.
Based on the comments received, this was possibly due to the necessity of frozing the code. The code in the python console is not frozen and this would be the cause of the issue. I then considered running the following example from multiprocessing for windows
from multiprocessing import Process, freeze_support
def foo():
print 'hello'
if __name__ == '__main__':
freeze_support()
p = Process(target=foo)
p.start()
This code suposedly freezes the code, but when running it in the python console, I get the same error as before.
You cannot spawn new child process(es) using mulitprocessing
directly from the python interpreter.
From the docs,
Note: Functionality within this package requires that the main module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the Pool examples will not work in the interactive interpreter.
And the guideline says that
Safe importing of main module
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
Calling freeze_support() has no effect when invoked on any operating system other than Windows. In addition, if the module is being run normally by the Python interpreter on Windows (the program has not been frozen), then freeze_support() has no effect.
Also, One should protect the “entry point” of the program by using if __name__ == '__main__':
as follows:
from multiprocessing import Process, freeze_support
def f():
print 'hello world!'
if __name__ == '__main__':
freeze_support()
Process(target=f).start()
If the freeze_support()
line is omitted then trying to run the frozen executable(e.g. created using pyinstaller
or py2exe
) will raise RuntimeError
.