Search code examples
python-2.7python-multiprocessingpool

Error using multiprocessing library: "got multiple values for keyword argument 'x' "


I am trying to parallelize a penalized linear model using the multiprocessing library in python.

I created a function that solves my model:

from __future__ import division
import numpy as np
from cvxpy import *

def lm_lasso_solver(x, y, lambda1):
    n = x.shape[0]
    m = x.shape[1]
    lambda1_param = Parameter(sign="positive")
    betas_var = Variable(m)
    response = dict(model='lm', penalization='l')
    response["parameters"] = {"lambda_vector": lambda1}
    lasso_penalization = lambda1_param * norm(betas_var, 1)
    lm_penalization = 0.5 * sum_squares(y - x * betas_var)
    objective = Minimize(lm_penalization + lasso_penalization)
    problem = Problem(objective)
    lambda1_param.value = lambda1
    try:
        problem.solve(solver=ECOS)
    except:
        try:
            problem.solve(solver=CVXOPT)
        except:
            problem.solve(solver=SCS)
    beta_sol = np.asarray(betas_var.value).flatten()
    response["solution"] = beta_sol
    return response

In this function x is a matrix of predictors and y is the response variable. lambda1 is the parameter that must be optimized, and so, is the parameter that I want to parallelize. I saved this script in a python file called "ms.py"

Then I created another python file called "parallelization.py" and in that file I defined the following:

import multiprocessing as mp
import ms
import functools

def myFunction(x, y, lambda1):
    pool = mp.Pool(processes=mp.cpu_count())
    results = pool.map(functools.partial(ms.lm_lasso_solver, x=x, y=y), lambda1)
    return results

So the idea was now, on the python interpreter, execute:

from sklearn.datasets import load_boston
boston = load_boston()
x = boston.data
y = boston.target
runfile('parallelization.py')
lambda_vector = np.array([1,2,3])
myFunction(x, y, lambda_vector)

But when I do this, I get the following error message:enter image description here


Solution

  • The problem is on the line:

    results = pool.map(functools.partial(ms.lm_lasso_solver, x=x, y=y), lambda1)
    

    You are calling the functools.partial() method with keyworded arguments whereas in your lm_lasso_solver method, you don't define them as keyworded arguments. You should call it with x and y as positional arguments as follows:

    results = pool.map(functools.partial(ms.lm_lasso_solver, x, y), lambda1)
    

    or simply use the apply_async() method the pool object :

    results = pool.apply_async(ms.lm_lasso_solver, args=[x, y, lambda1])