Search code examples
pythonpython-multiprocessingpython-itertools

No error message when applying list() to iterator in a multiprocessing


I am trying to implement starmap in a small python script I am writing. To do this I have been using the following answer to a stackoverflow post. However, during the implementation process I encountered an issue I could not explain. Below I have attached code reproducing the issue of my original code.

from itertools import repeat
import multiprocessing

# from stackexchange: https://stackoverflow.com/a/53173433/17456342
def starmap_with_kwargs(pool, fn, args_iter, kwargs_iter):
    args_for_starmap = zip(repeat(fn), args_iter, kwargs_iter)
    print(args_iter)
    return pool.starmap(apply_args_and_kwargs, args_for_starmap)

def apply_args_and_kwargs(fn, args, kwargs):
    print('test')
    return fn(*args, **kwargs)

def func(path, dictArg, **kwargs):
    for i in dictArg:
        print(i['a'])
        print(kwargs['yes'])

def funcWrapper(path, dictList, **kwargs):

    args_iter = zip(repeat(path), dictList)
    kwargs_iter = repeat(kwargs)

    # list(args_iter)

    pool = multiprocessing.Pool()
    starmap_with_kwargs(pool, func, args_iter, kwargs_iter)
       
    
dictList = [{'a: 2'}, {'a': 65}, {'a': 213}, {'a': 3218}]
path = 'some/path/to/something'

funcWrapper(path, dictList, yes=1)

The issue is the following: if I run the code above I get a TypeError message I expect should happen (this error is fixed by removing the loop in func). However, if I include the line list(args_iter) there is no error message and I have no idea why this happens, my issue then is why is there no error message when list(args_iter) is included?

I am using python 3.8.10 on ubuntu 20.04.6 LTS in WSL

Below I have attached the (expected) error message I get when I remove the line.

<zip object at 0x7fa1ec0b8340>
test
test
test
test
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "testing.py", line 67, in apply_args_and_kwargs
    return fn(*args, **kwargs)
  File "testing.py", line 71, in func
    print(i['a'])
TypeError: string indices must be integers
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "testing.py", line 88, in <module>
    funcWrapper(path, dictList, yes=1)
  File "testing.py", line 82, in funcWrapper
    starmap_with_kwargs(pool, func, args_iter, kwargs_iter)
  File "testing.py", line 61, in starmap_with_kwargs
    return pool.starmap(apply_args_and_kwargs, args_for_starmap)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 372, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
TypeError: string indices must be integers

Solution

  • This has nothing to do with multiprocessing. It has to do with how Python iterators work. They only work once, and after that they are empty.

    zip is iterator. You bind it to the variable args_iter here:

    args_iter = zip(repeat(path), dictList)
    

    That's not necessarily a problem. But when you run this line:

    list(args_iter)
    

    The iterator executes and puts its items into a list. After you do that, the iterator is now exhausted. When you later pass it to starmap_with_args, it's empty.

    If you comment out the line where you turn args_iter into a list, then of course the iterator doesn't get used.

    Check out this little script:

    x = (0, 1, 2)
    y = "ABC"
    zipper = zip(x, y)
    
    list(zipper)
    
    for n, s in zipper:
        print(n, s)
    

    This will not print out anything at all. However, if you comment out the line

    list(zipper)
    

    the script will then produce this output:

    0 A
    1 B
    2 C