Search code examples
pythonmultiprocessingioerrorreentrantlock

multiprocessing > Manager() > RLock Error:


I've got a collection of multiprocessing.Process objects in a list, and they all use the same instance of what I will call a "process safe queue" to communicate in a process-safe (thread-safe but with processes) way to the parent process whose responsibility it is to manage the threads.

When the child process goes to put something into the queue, it calls the ProcessSafeQueue().enqueue() which first acquires a multiprocessing.Manager > RLock, then writes to the queue, and finally releases the lock.

In this case it was the pid of the child process. Here's a traceback of the error.

Traceback (most recent call last): 
File /usr/lib/python2.5/site-packages/my_project/some_module.py, line 87, in send_data
    q.enqueue(os.getpid())
File /usr/lib/python2.5/site-packages/my_project/some_module.py, line 33, in enqueue
    self.lock.acquire()
File /usr/lib/python2.5/site-packages/processing/managers.py, line 979, in acquire
    return self._callMethod(\'acquire\', (blocking,))
File /usr/lib/python2.5/site-packages/processing/managers.py, line 740, in _callMethod
    self._connect()
File /usr/lib/python2.5/site-packages/processing/managers.py, line 727, in _connect
    connection = Client(self._token.address, authkey=self._authkey)
File /usr/lib/python2.5/site-packages/processing/connection.py, line 187, in Client
    answerChallenge(c, authkey)
File /usr/lib/python2.5/site-packages/processing/connection.py, line 425, in answerChallenge
    message = connection.recvBytes()

And here's the actual error:

IOError: [Errno 11] Resource temporarily unavailable

I'm wondering if someone could help me understand why I might get this error after the application had been running successfully for ~7 hours or so.


Solution

  • The answer here is that the error is totally misleading. Resource temporarily unavailable really means that some error occurred while reading the socket. This could be an auth error, or that there was simply no data available to read (though I'm not sure why that would generate an error... it does, in practice). The solution here is to suppress the error and retry.

    [Update after a few years of further experience with concurrency in Python]

    I found that designing my concurrency model to reduce/remove-altogether the need for synchronization mechanisms (locks, queues, or semaphores) both simplified my design and made it work with less fragility and without the error I described above.