Search code examples
pythonpython-3.xpython-multiprocessing

Pool.imap_unordered skips value from the iterable


I am trying to run the following code to parallalize a function that crops geotifs. Geotifs are named as <location>__img_news1a_iw_rt30_<hex_code>_g_gpf_vv.tif. The code works perfectly fine but it skips a particular set of geotif from even reading from the vv_tif iterable. In particular, out of locationA_img_news1a_iw_rt30_20170314t115609_g_gpf_vv.tif, locationA_img_news1a_iw_rt30_20170606t115613_g_gpf_vv.tif and locationA_img_news1a_iw_rt30_20170712t115615_g_gpf_vv.tif it skips locationA_img_news1a_iw_rt30_20170712t115615_g_gpf_vv.tif every single time from reading when I read these files along with other location geotifs. However, it reads this file if I create an iterable from only these three geotif files. I have tried changing chunksize but it doesn't help. Am I missing something here?

from multiprocessing import Pool, cpu_count
try:
    pool = Pool(cpu_count())
    pool.imap_unordered(tile_geotif, vv_tif, chunksize=11)
finally:
    pool.close()

EDIT: I have 55 files in total and it only drops locationA_img_news1a_iw_rt30_20170712t115615_g_gpf_vv.tif file every single time.


Solution

  • This is too much to show in comments, putting here in answer.

    It seems to me that the map functions work in my toy examples below. I think you have error in your input data to cause the corrupted output. Either that, or you found a bug. If so, do try to create a reproducible example.

    from multiprocessing import Pool
    
    vv_tif = list(range(10))
    def square(x):
        return x**x
    
    with Pool(5) as p:
        print(p.map(square, vv_tif))
    
    with Pool(5) as p:
        print(list(p.imap(square, vv_tif)))
    
    with Pool(5) as p:
        print(list(p.imap_unordered(square, vv_tif)))
    
    with Pool(5) as p:
        print(list(p.imap_unordered(square, vv_tif, chunksize=11)))
    

    Output:

    [1, 1, 4, 27, 256, 3125, 46656, 823543, 16777216, 387420489]
    [1, 1, 4, 27, 256, 3125, 46656, 823543, 16777216, 387420489]
    [1, 1, 256, 3125, 46656, 823543, 16777216, 4, 27, 387420489]
    [1, 1, 4, 27, 256, 3125, 46656, 823543, 16777216, 387420489]
    

    Usually all 4 lines were the same. I ran it a few times till I got a different ordering on one. It looks to me that it works.

    Note that his demonstrates that the various map functions are not mutating underlying data.