python stream python-multiprocessing music21

Multiprocess tasks in python that can't be pickled?

I'm trying to use multiprocessing in python (2.7.8) on a Mac OSX. After reading Velimir Mlaker's answer to this question, I was able to use multiprocessing.Pool() to multiprocess a trivially simple function but it doesn't work with my actual function. I get the right results but it executes serially. I believe the problem is that my function loops over a music21.stream() which is similar to a list but has special functionality for music data. I believe that music21 streams cannot be pickled so is there some multiprocessing alternative to pool that I can use? I don't mind if results are returned out of order and I can upgrade to a different version of python if necessary. I've included my code for the multiprocessing task but not for the stream_indexer() function it calls. Thank you!

import multiprocessing as mp
def basik(test_piece, part_numbers):
    jobs = []
    for i in part_numbers:
        # Each 2-tuple in jobs has an index <i> and a music21 stream that
        # corresponds to an individual part in a musical score.
        jobs.append((i, test_piece.parts[i]))
    pool = mp.Pool(processes=4)
    results = pool.map(stream_indexer, jobs)
    pool.close()
    pool.join()

    return results

Solution

The newest git commits of music21 have features to help with some of the trickier parts of multiprocessing, based on joblib. For instance if you want to count all the notes in a part, you can normally do in serial:

import music21
def countNotes(s):
    return len(s.recurse().notes)
    # using recurse() instead of .flat to avoid certain caches...

bach = music21.corpus.parse('bach/bwv66.6')
[countNotes(p) for p in bach.parts]

in parallel it works like this:

music21.common.runParallel(list(bach.parts), countNotes)

BUT! here's the huge caveat. Let's time these:

In [5]: %timeit music21.common.runParallel(list(b.parts), countNotes)
10 loops, best of 3: 152 ms per loop

In [6]: %timeit [countNotes(p) for p in b.parts]
100 loops, best of 3: 2.19 ms per loop

On my computer (2 core, 4 thread), running in parallel is nearly 100x SLOWER than running in serial. Why? Because there's a significant overhead to preparing a Stream to be multiprocessed. If the routine being run is very slow (around 1ms/note divided by number of processors) then it's worth passing Streams around in multiprocessing. Otherwise, see if there are ways to only pass back and forth small bits of information, such as the path to process:

def parseCountNotes(fn):
    s = corpus.parse(fn)
    return len(s.recurse().notes)  

bach40 = [b.sourcePath for b in music21.corpus.search('bwv')[0:40]]

In [32]: %timeit [parseCountNotes(b) for b in bach40]
1 loops, best of 3: 2.39 s per loop

In [33]: %timeit music21.common.runParallel(bach40, parseCountNotes)
1 loops, best of 3: 1.83 s per loop

Here we're beginning to get speedups even on a MacBook Air. On my office Mac Pro, the speedups get huge for calls such as this. In this case, the call to parse massively dominates over the time to recurse().