python windows python-2.7 multiprocessing python-multiprocessing

multiprocessing ImportError: No module named <input>

I am using a Windows machine and I have a code designed for Python 2.7 that solves an statistical model. Since the model depends on the value of a parameter, I created a parallelized version that solves one model for each value of the parameter.

Consider for instance a first file called main_function that includes the following code (this code is here for the sake of replicability but is not question-related):

import numpy as np
import cvxpy

def lm_lasso(x, y, lambda1=None):
    n = x.shape[0]
    m = x.shape[1]
    lambda_param = cvxpy.Parameter(sign="positive")
    # Define the objective function
    beta_var = cvxpy.Variable(m)
    lasso_penalization = lambda_param * cvxpy.norm(beta_var, 1)
    lm_penalization = (1.0 / n) * cvxpy.sum_squares(y - x * beta_var)
    objective = cvxpy.Minimize(lm_penalization + lasso_penalization)
    problem = cvxpy.Problem(objective)
    beta_sol_list = []
    for l in lambda1:
        lambda_param.value = l
        problem.solve(solver=cvxpy.ECOS)
        beta_sol = np.asarray(np.row_stack([b.value for b in beta_var])).flatten()
        beta_sol_list.append(beta_sol)
    return beta_sol_list

And a second file called parallel_function that includes the following code:

import multiprocessing as mp
import numpy as np
import functools
import zz_main_function as mf

def lm_lasso_parallel(x, y, lambda1):
    chunks = np.array_split(lambda1, mp.cpu_count())
    pool = mp.Pool(processes=mp.cpu_count())
    results = pool.map(functools.partial(mf.lm_lasso, x, y), chunks)
    pool.close()
    pool.join()
    return results

The reason why I splitted the functions into two files is because this way everything seemed to work without adding the usual if __name__ == '__main__': required when dealing with multiprocessing.

This code was written some months ago and worked perfectly either from the python console or by runnig a python file like:

import zz_parallel_function as pf
from sklearn.datasets import load_boston

boston = load_boston()
x = boston.data
y = boston.target
lambda1 = [0, 1e-3, 1e-2, 1e-1, 1, 1e2, 1e3]

r_parallel = pf.lm_lasso_parallel(x, y, lambda1)

Recently I had to format my computer and when I reinstalled python 2.7 and trried to run the code described before, I run into the following errors:

If I try to run it directly from python console:

import zz_parallel_function as pf
from sklearn.datasets import load_boston

boston = load_boston()
x = boston.data
y = boston.target
lambda1 = [0, 1e-3, 1e-2, 1e-1, 1, 1e2, 1e3]

r_parallel = pf.lm_lasso_parallel(x, y, lambda1)

If I run it as an independent file:

So my question is:

Why did this code work before and not now? The only thing that (possibly) changed is the version of some of the modules installed but I dont think this is that relevant
Any guess on how to get it working again?

EDIT 1

By adding if __name__ == '__main__': to the code and running it as an independent file, it executes with no problem. However, when I try to execute it in a python console, it offers the same error as before.

Based on the comments received, this was possibly due to the necessity of frozing the code. The code in the python console is not frozen and this would be the cause of the issue. I then considered running the following example from multiprocessing for windows

from multiprocessing import Process, freeze_support

def foo():
    print 'hello'

if __name__ == '__main__':
    freeze_support()
    p = Process(target=foo)
    p.start()

This code suposedly freezes the code, but when running it in the python console, I get the same error as before.

Solution

You cannot spawn new child process(es) using mulitprocessing directly from the python interpreter.

From the docs,

Note: Functionality within this package requires that the main module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the Pool examples will not work in the interactive interpreter.

And the guideline says that

Safe importing of main module

Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).

Calling freeze_support() has no effect when invoked on any operating system other than Windows. In addition, if the module is being run normally by the Python interpreter on Windows (the program has not been frozen), then freeze_support() has no effect.

Also, One should protect the “entry point” of the program by using if __name__ == '__main__': as follows:

from multiprocessing import Process, freeze_support

def f():
    print 'hello world!'

if __name__ == '__main__':
    freeze_support()
    Process(target=f).start()

If the freeze_support() line is omitted then trying to run the frozen executable(e.g. created using pyinstaller or py2exe) will raise RuntimeError.