python numpy multiprocessing dill pathos

returning a two dimensional array by multiprocessing

In the following code which is an example of my main code, I have tried to use pathos.multiprocessing to increase the speed of iteration of a loop. The output of each iteration which has implemented with multiprocessing is a 2-D array. I used pathos.multiprocessing instead of multiprocessing since I wanted to use it in my class method. I have used apipe method of the pathos.multiprocessing to collect the output in a list but it returns an empty list. I have no idea why it fails

import numpy as np
import random
import pathos.multiprocessing as mp
class Testsystematics(object):
      def __init__(self, x, y, NTH = None, THMIN = None, THMAX = None, NRESAMPLE = None):
         self.x        = x
         self.y        = y
         self.nbins    = NTH
         self.bmin     = THMIN
         self.bmax     = THMAX
         self.nresample= NRESAMPLE
         self.bins     = np.linspace(self.bmin, self.bmax, self.nbins+1, True).astype(np.float)
         self.sample   = np.array([[random.choice(range(len(self.y))) for _ in xrange(len(self.y))] for i in range(self.nresample)])
         self.result_list=[]
      def log_result(self, result):
          self.result_list.append(result)
      def bootstrapping(self, k):
          xi_p     = np.zeros(self.nbins, float)
          xi_m     = np.zeros(self.nbins, float)
          nind     = np.zeros(self.nbins, float)
          for i in range(len(self.x)):
              for j in range(len(self.x)):
                  if (i!=j): 
                     sep= np.sqrt(self.x[i]**2+self.x[j]**2)
                     index= np.searchsorted(self.bins, sep , side='right')-1 
                     sind = np.sin(sep)
                     if ((sep< self.bins[-1]) and (sep>=self.bins[0])):
                        xi_p[index] += sind*(np.mean(y)-np.median(y))
                        xi_m[index] += sind*np.std(y)
                        nind[index] += 1.0
          for i in range(self.nbins):
              xi_p[i]=xi_p[i]/nind[i]
              xi_m[i]=xi_m[i]/nind[i]
          return np.vstack((xi_p,xi_m))
      def twopcf(self):   
         if (self.sys_type==1):
            pool = mp.ProcessingPool(16)
            for n in range(self.nresample):
                pool.apipe(self.bootstrapping, args=(n,), callback=self.log_result)

shape,scale=0.5, 0.6
x=np.random.gamma(shape, scale, 10000)
mu1, sigma1 = 0, 0.5 # mean and standard deviation
mu2, sigma2 = 0.1, 0.7 # mean and standard deviation

y = np.random.normal(mu1, sigma1, 1000)+np.random.normal(mu2, sigma2, 1000)
sysTest=Testsystematics(x, y, NTH = 10, THMIN = 0, THMAX = 5, NRESAMPLE = 100)

any suggestion?

Solution

I'm the pathos author. I tried your code, and it runs, but produces no error and produces no result in result_list. I believe that is because you are using apipe incorrectly. The correct use of apipe is as follows:

>>> import pathos
>>> def squared(x):
...   return x**2
... 
>>> pool = pathos.multiprocessing.ProcessingPool()
>>> res = pool.apipe(squared, 5)
>>> res.get()
25

self.bootstrapping takes self and k, so you have to provide a k in the pipe call when you calling it as an instance method. There is no callback -- if you want a callback, you'd need to add one to your function.

Note that the return value is retrieved by (1) getting a return object, and (2) by calling get on the return object.

From you use of apipe within a for loop, that points me to suggest you use pool.amap (or pool.imap) instead -- then you can do the for loop in parallel.