Search code examples
pythonpython-2.7parallel-processingjoblib

Accessing and altering a global array using python joblib


I am attempting to use joblib in python to speed up some data processing but I'm having issues trying to work out how to assign the output into the required format. I have tried to generate a, perhaps overly simplistic, code which shows the issues that I'm encountering:

from joblib import Parallel, delayed
import numpy as np

def main():
    print "Nested loop array assignment:"
    regular()
    print "Parallel nested loop assignment using a single process:"
    par2(1)
    print "Parallel nested loop assignment using multiple process:"
    par2(2)

def regular():
    # Define variables
    a = [0,1,2,3,4]
    b = [0,1,2,3,4]
    # Set array variable to global and define size and shape
    global ab
    ab = np.zeros((2,np.size(a),np.size(b)))

    # Iterate to populate array
    for i in range(0,np.size(a)):
        for j in range(0,np.size(b)):
            func(i,j,a,b)

    # Show array output
    print ab

def par2(process):
    # Define variables
    a2 = [0,1,2,3,4]
    b2 = [0,1,2,3,4]
    # Set array variable to global and define size and shape
    global ab2
    ab2 = np.zeros((2,np.size(a2),np.size(b2)))

    # Parallel process in order to populate array
    Parallel(n_jobs=process)(delayed(func2)(i,j,a2,b2) for i in xrange(0,np.size(a2)) for j in xrange(0,np.size(b2)))

    # Show array output
    print ab2

def func(i,j,a,b):
    # Populate array
    ab[0,i,j] = a[i]+b[j]
    ab[1,i,j] = a[i]*b[j]

def func2(i,j,a2,b2):
    # Populate array
    ab2[0,i,j] = a2[i]+b2[j]
    ab2[1,i,j] = a2[i]*b2[j]

# Run script
main()

The ouput of which looks like this:

Nested loop array assignment:
[[[  0.   1.   2.   3.   4.]
  [  1.   2.   3.   4.   5.]
  [  2.   3.   4.   5.   6.]
  [  3.   4.   5.   6.   7.]
  [  4.   5.   6.   7.   8.]]

 [[  0.   0.   0.   0.   0.]
  [  0.   1.   2.   3.   4.]
  [  0.   2.   4.   6.   8.]
  [  0.   3.   6.   9.  12.]
  [  0.   4.   8.  12.  16.]]]
Parallel nested loop assignment using a single process:
[[[  0.   1.   2.   3.   4.]
  [  1.   2.   3.   4.   5.]
  [  2.   3.   4.   5.   6.]
  [  3.   4.   5.   6.   7.]
  [  4.   5.   6.   7.   8.]]

 [[  0.   0.   0.   0.   0.]
  [  0.   1.   2.   3.   4.]
  [  0.   2.   4.   6.   8.]
  [  0.   3.   6.   9.  12.]
  [  0.   4.   8.  12.  16.]]]
Parallel nested loop assignment using multiple process:
[[[ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]]

 [[ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.]]]

From Google and StackOverflow search function it appears that when using joblib, the global array isn't shared between each subprocess. I'm uncertain if this is a limitation of joblib or if there is a way to get around this?

In reality my script is surrounded by other bits of code that rely on the final output of this global array being in a (4,x,x) format where x is variable (but typically ranges in the 100s to several thousands). This is my current reason for looking at parallel processing as the whole process can take up to 2 hours for x = 2400.

The use of joblib isn't necessary (but I like the nomenclature and simplicity) so feel free to suggest simple alternative methods, ideally keeping in mind the requirements of the final array. I'm using python 2.7.3, and joblib 0.7.1.


Solution

  • I was able to resolve the issues with this simple example using numpy's memmap. I was still having issues after using memmap and following the examples on the joblib documentation webpage but I upgraded to the latest joblib version (0.9.3) via pip and it all runs smoothly. Here is the working code:

    from joblib import Parallel, delayed
    import numpy as np
    import os
    import tempfile
    import shutil
    
    def main():
    
        print "Nested loop array assignment:"
        regular()
    
        print "Parallel nested loop assignment using numpy's memmap:"
        par3(4)
    
    def regular():
        # Define variables
        a = [0,1,2,3,4]
        b = [0,1,2,3,4]
    
        # Set array variable to global and define size and shape
        global ab
        ab = np.zeros((2,np.size(a),np.size(b)))
    
        # Iterate to populate array
        for i in range(0,np.size(a)):
            for j in range(0,np.size(b)):
                func(i,j,a,b)
    
        # Show array output
        print ab
    
    def par3(process):
    
        # Creat a temporary directory and define the array path
        path = tempfile.mkdtemp()
        ab3path = os.path.join(path,'ab3.mmap')
    
        # Define variables
        a3 = [0,1,2,3,4]
        b3 = [0,1,2,3,4]
    
        # Create the array using numpy's memmap
        ab3 = np.memmap(ab3path, dtype=float, shape=(2,np.size(a3),np.size(b3)), mode='w+')
    
        # Parallel process in order to populate array
        Parallel(n_jobs=process)(delayed(func3)(i,a3,b3,ab3) for i in xrange(0,np.size(a3)))
    
        # Show array output
        print ab3
    
        # Delete the temporary directory and contents
        try:
            shutil.rmtree(path)
        except:
            print "Couldn't delete folder: "+str(path)
    
    def func(i,j,a,b):
        # Populate array
        ab[0,i,j] = a[i]+b[j]
        ab[1,i,j] = a[i]*b[j]
    
    def func3(i,a3,b3,ab3):
        # Populate array
        for j in range(0,np.size(b3)):
            ab3[0,i,j] = a3[i]+b3[j]
            ab3[1,i,j] = a3[i]*b3[j]
    
    # Run script
    main()
    

    Giving the following results:

    Nested loop array assignment:
    [[[  0.   1.   2.   3.   4.]
      [  1.   2.   3.   4.   5.]
      [  2.   3.   4.   5.   6.]
      [  3.   4.   5.   6.   7.]
      [  4.   5.   6.   7.   8.]]
    
     [[  0.   0.   0.   0.   0.]
      [  0.   1.   2.   3.   4.]
      [  0.   2.   4.   6.   8.]
      [  0.   3.   6.   9.  12.]
      [  0.   4.   8.  12.  16.]]]
    Parallel nested loop assignment using numpy's memmap:
    [[[  0.   1.   2.   3.   4.]
      [  1.   2.   3.   4.   5.]
      [  2.   3.   4.   5.   6.]
      [  3.   4.   5.   6.   7.]
      [  4.   5.   6.   7.   8.]]
    
     [[  0.   0.   0.   0.   0.]
      [  0.   1.   2.   3.   4.]
      [  0.   2.   4.   6.   8.]
      [  0.   3.   6.   9.  12.]
      [  0.   4.   8.  12.  16.]]]
    

    A few of my thoughts to note for any future readers:

    • On small arrays, the time taken to prepare the parallel environment (generally referred to as overhead) means that this runs slower than the simple for loop.
    • Comparing a larger array eg. setting a and a3 to np.arange(0,10000), and b and b3 to np.arange(0,1000) gave times of 12.4s for the "regular" method and 7.7s for the joblib method.
    • The overheads meant that it was faster to let each core perform the inner j loop (see func3). This makes sense since I'm only starting 10,000 processes rather than starting 10,000,000
      processes each of which would need setting up.