Search code examples

Accessing and altering a global array using python joblib

I am attempting to use joblib in python to speed up some data processing but I'm having issues trying to work out how to assign the output into the required format. I have tried to generate a, perhaps overly simplistic, code which shows the issues that I'm encountering:

from joblib import Parallel, delayed
import numpy as np

def main():
    print "Nested loop array assignment:"
    print "Parallel nested loop assignment using a single process:"
    print "Parallel nested loop assignment using multiple process:"

def regular():
    # Define variables
    a = [0,1,2,3,4]
    b = [0,1,2,3,4]
    # Set array variable to global and define size and shape
    global ab
    ab = np.zeros((2,np.size(a),np.size(b)))

    # Iterate to populate array
    for i in range(0,np.size(a)):
        for j in range(0,np.size(b)):

    # Show array output
    print ab

def par2(process):
    # Define variables
    a2 = [0,1,2,3,4]
    b2 = [0,1,2,3,4]
    # Set array variable to global and define size and shape
    global ab2
    ab2 = np.zeros((2,np.size(a2),np.size(b2)))

    # Parallel process in order to populate array
    Parallel(n_jobs=process)(delayed(func2)(i,j,a2,b2) for i in xrange(0,np.size(a2)) for j in xrange(0,np.size(b2)))

    # Show array output
    print ab2

def func(i,j,a,b):
    # Populate array
    ab[0,i,j] = a[i]+b[j]
    ab[1,i,j] = a[i]*b[j]

def func2(i,j,a2,b2):
    # Populate array
    ab2[0,i,j] = a2[i]+b2[j]
    ab2[1,i,j] = a2[i]*b2[j]

# Run script

The ouput of which looks like this:

Nested loop array assignment:
[[[  0.   1.   2.   3.   4.]
  [  1.   2.   3.   4.   5.]
  [  2.   3.   4.   5.   6.]
  [  3.   4.   5.   6.   7.]
  [  4.   5.   6.   7.   8.]]

 [[  0.   0.   0.   0.   0.]
  [  0.   1.   2.   3.   4.]
  [  0.   2.   4.   6.   8.]
  [  0.   3.   6.   9.  12.]
  [  0.   4.   8.  12.  16.]]]
Parallel nested loop assignment using a single process:
[[[  0.   1.   2.   3.   4.]
  [  1.   2.   3.   4.   5.]
  [  2.   3.   4.   5.   6.]
  [  3.   4.   5.   6.   7.]
  [  4.   5.   6.   7.   8.]]

 [[  0.   0.   0.   0.   0.]
  [  0.   1.   2.   3.   4.]
  [  0.   2.   4.   6.   8.]
  [  0.   3.   6.   9.  12.]
  [  0.   4.   8.  12.  16.]]]
Parallel nested loop assignment using multiple process:
[[[ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]]

 [[ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]]]

From Google and StackOverflow search function it appears that when using joblib, the global array isn't shared between each subprocess. I'm uncertain if this is a limitation of joblib or if there is a way to get around this?

In reality my script is surrounded by other bits of code that rely on the final output of this global array being in a (4,x,x) format where x is variable (but typically ranges in the 100s to several thousands). This is my current reason for looking at parallel processing as the whole process can take up to 2 hours for x = 2400.

The use of joblib isn't necessary (but I like the nomenclature and simplicity) so feel free to suggest simple alternative methods, ideally keeping in mind the requirements of the final array. I'm using python 2.7.3, and joblib 0.7.1.


  • I was able to resolve the issues with this simple example using numpy's memmap. I was still having issues after using memmap and following the examples on the joblib documentation webpage but I upgraded to the latest joblib version (0.9.3) via pip and it all runs smoothly. Here is the working code:

    from joblib import Parallel, delayed
    import numpy as np
    import os
    import tempfile
    import shutil
    def main():
        print "Nested loop array assignment:"
        print "Parallel nested loop assignment using numpy's memmap:"
    def regular():
        # Define variables
        a = [0,1,2,3,4]
        b = [0,1,2,3,4]
        # Set array variable to global and define size and shape
        global ab
        ab = np.zeros((2,np.size(a),np.size(b)))
        # Iterate to populate array
        for i in range(0,np.size(a)):
            for j in range(0,np.size(b)):
        # Show array output
        print ab
    def par3(process):
        # Creat a temporary directory and define the array path
        path = tempfile.mkdtemp()
        ab3path = os.path.join(path,'ab3.mmap')
        # Define variables
        a3 = [0,1,2,3,4]
        b3 = [0,1,2,3,4]
        # Create the array using numpy's memmap
        ab3 = np.memmap(ab3path, dtype=float, shape=(2,np.size(a3),np.size(b3)), mode='w+')
        # Parallel process in order to populate array
        Parallel(n_jobs=process)(delayed(func3)(i,a3,b3,ab3) for i in xrange(0,np.size(a3)))
        # Show array output
        print ab3
        # Delete the temporary directory and contents
            print "Couldn't delete folder: "+str(path)
    def func(i,j,a,b):
        # Populate array
        ab[0,i,j] = a[i]+b[j]
        ab[1,i,j] = a[i]*b[j]
    def func3(i,a3,b3,ab3):
        # Populate array
        for j in range(0,np.size(b3)):
            ab3[0,i,j] = a3[i]+b3[j]
            ab3[1,i,j] = a3[i]*b3[j]
    # Run script

    Giving the following results:

    Nested loop array assignment:
    [[[  0.   1.   2.   3.   4.]
      [  1.   2.   3.   4.   5.]
      [  2.   3.   4.   5.   6.]
      [  3.   4.   5.   6.   7.]
      [  4.   5.   6.   7.   8.]]
     [[  0.   0.   0.   0.   0.]
      [  0.   1.   2.   3.   4.]
      [  0.   2.   4.   6.   8.]
      [  0.   3.   6.   9.  12.]
      [  0.   4.   8.  12.  16.]]]
    Parallel nested loop assignment using numpy's memmap:
    [[[  0.   1.   2.   3.   4.]
      [  1.   2.   3.   4.   5.]
      [  2.   3.   4.   5.   6.]
      [  3.   4.   5.   6.   7.]
      [  4.   5.   6.   7.   8.]]
     [[  0.   0.   0.   0.   0.]
      [  0.   1.   2.   3.   4.]
      [  0.   2.   4.   6.   8.]
      [  0.   3.   6.   9.  12.]
      [  0.   4.   8.  12.  16.]]]

    A few of my thoughts to note for any future readers:

    • On small arrays, the time taken to prepare the parallel environment (generally referred to as overhead) means that this runs slower than the simple for loop.
    • Comparing a larger array eg. setting a and a3 to np.arange(0,10000), and b and b3 to np.arange(0,1000) gave times of 12.4s for the "regular" method and 7.7s for the joblib method.
    • The overheads meant that it was faster to let each core perform the inner j loop (see func3). This makes sense since I'm only starting 10,000 processes rather than starting 10,000,000
      processes each of which would need setting up.