Search code examples
pythonmultithreadingmultiprocessingconcurrent.futures

Is this the correct way to send multiple arguments to concurrent.futures.ThreadPoolExecutor?


i have to check files in folder structure like this

|_HMR
| |__2015
| |__2016
| |__2017
|
|_TMR1
 |__2015
 |__2016
 |__2017

i use to call my function like this and it works fine

check_continuity('TMR1', 2015)
check_continuity('TMR1', 2016)
check_continuity('TMR1', 2017)
check_continuity('HMR', 2015)
check_continuity('HMR', 2016)
check_continuity('HMR', 2017)

but i would like to make this faster by using multiprocessing (concurrent.futures), so is this a right way to send arguments to my function. since first argument has only two variants and second one has three. and i want first argument to run for three different years and then second argument for three different years.

in short i would like to get the result like my previous way of calling functions individually but faster

am trying to do like this, but looks like its missing few combinations

if __name__ == '__main__':
    with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
        print('creating ThreadPoolExecutor')
        start_scrape1 = executor.map(check_continuity, ('HMR', 'TMR1'), (2015, 2016, 2017))

Solution

  • Do you want something like You have arguments like ('HMR', 'TMR1'), (2015, 2016, 2017) So it is taking one from the first one and one from the seconds one making ('HMR', 2015), ('TMR1', 2016) and since first one is exhausted it is finished.

    So what you want to provide is ('HMR','HMR','HMR','TMR1','TMR1','TMR1') and (2015, 2016, 2017,2015, 2016, 2017)

    [i for i in p for j in year]
    

    is effectively same as

    temp = []
    for i in p:
        for j in year:
            temp.append(i)
    

    which will give ['HMR', 'HMR', 'HMR', 'TMR1', 'TMR1', 'TMR1'] and year*len(p) will provide (2015, 2016, 2017,2015, 2016, 2017)

    Here the loop works for that

    def check_continuity(a,b):
        print(a,b)
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
            print('creating ThreadPoolExecutor')
            p = ('HMR', 'TMR1')
            year = (2015, 2016, 2017)
            start_scrape1 = executor.map(check_continuity, [i for i in p for j in year], year*len(p))
    

    output

    creating ThreadPoolExecutor
    HMR 2015
    HMR 2016
    HMR 2017
    TMR1 2015
    TMR1TMR1 2017
     2016