Search code examples
pythonmultiprocessinggspread

Multiprocessing - returning unpickleable objects?


I've actually asked a question about multiprocessing before, but now I'm running in to a weird shortcoming with the type of data that gets returned.

I'm using Gspread to interface with Google's Sheets API and get a "worksheet" object back.

This object, or an aspect of this object, is apparently incompatible with multiprocessing due to being "unpickle-able". Please see output:

File "/usr/lib64/python2.7/multiprocessing/pool.py", line 554, in get
raise self._value

multiprocessing.pool.MaybeEncodingError: Error sending result: '[<Worksheet 'Activation Log' id:o12345wm>]'. 
Reason: 'UnpickleableError(<ssl.SSLContext object at 0x1e4be30>,)'

The code I'm using is essentially:

from multiprocessing import pool
from oauth2client.client import SignedJwtAssertionCredentials
import gspread

sheet = 1
pool = multiprocessing.pool.Pool(1)
p = pool.apply_async(get_a_worksheet, args=(sheet,))

worksheet = p.get()

And the script fails while attempting to "get" the results. The get_a_worksheet function returns a Gspread worksheet object that allows me to manipulate the remote sheet. Being able to upload changes to the document is important here - I'm not just trying to reference data, I need to alter it as well.

Does anyone know how I can run a subprocess in a separate and monitorable thread, and get an arbitrary (or custom) object type safely out of it at the end? Does anyone know what makes the ssl.SSLContext object special and "unpickleable"?

Thanks all in advance.


Solution

  • I ended up writing a solution around this shortcoming by having the sub-process simply perform the necessary work inside itself rather than return a Worksheet object.

    What I ended up with was about half a dozen function and multiprocessing function pairs, each one written to do what I needed done, but inside of a sub-process so that it could be monitored and timed.

    A hierarchical map would look something like:

    Main()
        check_spreadsheet_for_a_string()
            check_spreadsheet_for_a_string_worker()
        get_hash_of_spreadsheet()
            get_hash_of_spreadsheet_worker()
    

    ... etc

    Where the "worker" functions are the functions called in the multiprocessing setup, and the regular functions above them manage the sub-process and time it to make sure the overall program doesn't halt if the call to gspread internals hangs or takes too long.