Search code examples
pythonpython-requestspython-multiprocessing

Change Threading for Multiprocess


I have a code which starts many functions by threading.Thread() and inside functions it sends requests.

so as the result it works very slow with sending requests and I thought it would be ok with Multiprocess method instead but I am not sure.

the problem is that functions send many different requests so it would be difficult to replace it all for using asyncio, aiohttp or something like that.

so the main goal is to change code for it to send requests with normal speed in many threads and not be too complicated, because it is about 100 different request in it.

now code uses something like this to start threads:

import requests
import threading

def main(a,proxy,proxystr):
    requests.get()
    #do some
    requests.post()
    #do some2
    #etc

for kish in range(odnovrem_t):
    gf=threading.Thread(target=main,args=(kish,proxys,proxystr,))
    gf.start()
    alg.append(gf)
for i in alg:
    i.join()

Solution

  • I find that concurrent.futures.Executors are more user-friendly than Process and Thread, especially for long inputs. Even the docs prefer them. Try concurrent.futures.ProcessPoolExecutor, it provides a map() function which has the same interface as the builtin.

    from concurrent.futures import ProcessPoolExecutor
    from itertools import repeat
    import os
    import requests
    
    
    def main(a,proxy,proxystr):
        requests.get()
        #do some
        requests.post()
        #do some2
        #etc
    
    kish = range(odnovrem_t)
    num_cpus = os.cpu_count()
    with ProcessPoolExecutor(max_workers=num_cpus-1) as executor:
        results = executor.map(main, kish, repeat(proxys), repeat(proxystr))
    

    There's no sense in creating a process pool with more processes than the total CPUs - 1. This is because we don't want more processes than the total CPUs, but must account for the main process.

    For arguments, kish was each value in range(odnovrem_t) so we make it the generator here. I assume proxys is a list of lists of proxy parameters like username, password, hostname, etc. Lastly, I assume proxystr is a format string which interpolates the values in proxys. So, kish is an int unique to each call to main(), and proxys and proxystr are repeated for each call to main().