Search code examples
pythonlistnestedpython-multiprocessing

python multiprocessing - Turning a nested Manager().list() into a nested Python list


I created a Manager list of lists to share between Processes so it updates correctly, but I don't know how to transform it into a Python list of lists afterwards to access it:

myList = Manager().list([Manager().list()])
p = Pool(processes=30)
p.apply_async(update_list, args=(myList))
p.close()
p.join()

myList = ?

I am aware of this method to transform a Manager list into a Python list, but need help on figuring out how to apply it to a nested list:

myList = Manager().list()
p = Pool(processes=30)
p.apply_async(update_list, args=(myList))
p.close()
p.join()

myList = list(myList)

EDIT: @Grismar suggested using myList = [list(sub) for sub in myList] but this minimal, reproducible code throws an FileNotFoundError: [Errno 2] No such file or directory error on my end:

from multiprocessing import Pool, Manager  
def update_list(myList):     
     myList.append(['test1','test2'])  
myList = Manager().list([Manager().list()]) 
p = Pool(processes=30) 
p.apply_async(update_list, args=(myList)) 
p.close() 
p.join()  
myList = [list(sub) for sub in myList]

Solution

  • There are numerous errors in your code, many already pointed out by other answers and comments here, so I am not going to repeat them. Now for your FileNotFoundError, this happens because your nested manager list is garbage collected the moment it is created as you're not creating a reference to it. Therefore, when you try to access elements inside myList (which includes this already deleted manager list), you get an error. So to fix that, simply create a reference to the list before nesting it:

    alist = Manager().list()
    myList = Manager().list([alist])
    

    Now for your main question, as others have pointed out, you can use list along with a list comprehension to convert a manager list to an actual list, but only if the manager list is only one nested level deep. For example, consider this code where the nested list is two levels deep:

    if __name__ == '__main__':
        alist = Manager().list()
        blist = Manager().list([alist])
    
        # myList is two nested levels deep
        myList = Manager().list([blist])
    
        p = Pool(processes=1)
        p.apply_async(update_list, args=(myList, )).get()
    
        p.close()
        p.join()
        
        print('before:', myList)
        myList = [list(sub) for sub in myList]
        print('after:', myList)
    

    This will not return the expected output:

    before: [<ListProxy object, typeid 'list' at 0x1d8a7d22e50>, ['test1', 'test2']]
    after: [[<ListProxy object, typeid 'list' at 0x23412e13a60>], ['test1', 'test2']]
    

    Therefore, I prefer this more general, approach below instead, which would work for any amount of nested levels (if any) and would return the expected output even if you don't submit a managed list at all or submit a managed list without any nested lists. It works because it checks each element of the list recursively:

    from multiprocessing import Pool, Manager
    from multiprocessing.managers import ListProxy
    
    
    def update_list(myList):
         myList.append(['test1','test2'])
    
    
    def get_value(l):
        return [get_value(sub_l) if isinstance(sub_l, ListProxy) else sub_l for sub_l in l]
    
    
    if __name__ == '__main__':
        alist = Manager().list()
        blist = Manager().list([alist])
    
        # myList is two nested levels deep
        myList = Manager().list([blist])
    
        p = Pool(processes=1)
        p.apply_async(update_list, args=(myList, )).get()
    
        p.close()
        p.join()
    
        print('before:', myList)
        myList = get_value(myList)
        print('after:', myList)
    

    Output

    before: [<ListProxy object, typeid 'list' at 0x226dfb72e20>, ['test1', 'test2']]
    after: [[[]], ['test1', 'test2']]
    

    If your list is long, then you make the above get_value more performant by making it request the whole list in one go and not open a connection to the manager server everytime it iterates over an element:

    def get_value(l):
        l = list(l)
        return [get_value(sub_l) if isinstance(sub_l, ListProxy) else sub_l for sub_l in l]
    

    As a sidenote, it seems that you want to create nested manager lists so that the outer list gets updated automatically if any change is made to the nested list. If that is the case, then you may want to check this answer, which outlines a way to automatically handle that without you having to manually create nested manager lists.