Search code examples
pythonpython-3.xmultiprocessingoserror

multiprocessing: No space left on device


When i run the multiprocessing example on a OSX. I get the Error OSError: [Errno 28] No space left on device.

The ENOSPC ("No space left on device") error will be triggered in any situation in which the data or the metadata associated with an I/O operation can't be written down anywhere because of lack of space. This doesn't always mean disk space – it could mean physical disk space, logical space (e.g. maximum file length), space in a certain data structure or address space. For example you can get it if there isn't space in the directory table (vfat) or there aren't any inodes left. It roughly means “I can't find where to write this down”. Source: https://stackoverflow.com/a/6999259/330658

What i don't understand, where files are written down in my code below?

Any help highly appreicated.

Example Code:

#! /usr/bin/env python3
import sys
import os
import multiprocessing as mp
import time


def foo_pool(x):
    time.sleep(2)
    return x*x

result_list = []
def log_result(result):
    result_list.append(result)
    print(result)

def apply_async_with_callback():
    pool = mp.Pool()
    for i in range(10):
        pool.apply_async(foo_pool, args = (i, ), callback = log_result)
    pool.close()
    pool.join()
    print(result_list)

if __name__ == '__main__':
    apply_async_with_callback()

Full Error:

python3 test.py
Traceback (most recent call last):
  File "test.py", line 32, in <module>
    apply_async_with_callback()
  File "test.py", line 23, in apply_async_with_callback
    pool = mp.Pool()
  File "/usr/local/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/usr/local/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 191, in __init__
    self._setup_queues()
  File "/usr/local/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 343, in _setup_queues
    self._inqueue = self._ctx.SimpleQueue()
  File "/usr/local/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 113, in SimpleQueue
    return SimpleQueue(ctx=self.get_context())
  File "/usr/local/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/queues.py", line 342, in __init__
    self._rlock = ctx.Lock()
  File "/usr/local/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 68, in Lock
    return Lock(ctx=self.get_context())
  File "/usr/local/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/synchronize.py", line 162, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/usr/local/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/synchronize.py", line 57, in __init__
    sl = self._semlock = _multiprocessing.SemLock(
OSError: [Errno 28] No space left on device

Solution

  • One possible reason to run into this error (as in my case), is that the system reaches a limit of allowed POSIX semaphores. This limit can be inspected by the sysctl kern.posix.sem.max command and is 10000 on my macOS 13.0.1.

    To set it, for example to 15000 until next reboot, you can use:

    sudo sysctl -w kern.posix.sem.max=15000
    

    While this allowed the Python script to run, I wasn't able to find out which processes were actually using up the semaphores. The only way I found of listing this type of semaphores was lsof. It shows them as type PSXSEM, e.g.:

    sudo lsof | grep PSXSEM
    

    But it found only a couple of the semaphores -- not nearly enough to justify reaching the limit. So, I suspect a bug in the system, where semaphores are not cleaned up correctly. Further evidence to this is that after a reboot the script was able to run with the initial limit set.