Search code examples
pythonpython-multithreading

How to run code in parallel with ThreadPoolExecutor?


Hi i'm really new to threading and it's making me confused, how can i run this code in parallel ?

def search_posts(page):

    page_url = f'https://jsonplaceholder.typicode.com/posts/{page}'
    req = requests.get(page_url)
    res = req.json()
    
    title = res['title']
    
    return title



page = 1

while True:

    with ThreadPoolExecutor() as executer:
        t = executer.submit(search_posts, page)

        title = t.result()

        print(title)

    if page == 20:
        break

    page += 1

Another question is do i need to learn operating systems in order to understand how does threading work?


Solution

  • The problem here is that you are creating a new ThreadPoolExecutor for every page. To do things in parallel, create only one ThreadPoolExecutor and use its map method:

    import concurrent.futures as cf
    import requests
    
    
    def search_posts(page):
        page_url = f'https://jsonplaceholder.typicode.com/posts/{page}'
        res = requests.get(page_url).json()
        return res['title']
    
    
    if __name__ == '__main__':
        with cf.ThreadPoolExecutor() as ex: 
            results = ex.map(search_posts, range(1, 21))
        for r in results:
            print(r)
    

    Note that using the if __name__ == '__main__' wrapper is a good habit in making your code more portable.


    One thing to keep in mind when using threads; If you are using CPython (the Python implementation from python.org which is the most common one), threads don't actually run in parallel.

    To make memory management less complicated, only one thread at a time can be executing Python bytecode in CPython. This is enforced by the Global Interpreter Lock ("GIL") in CPython.

    The good news is that using requests to get a web page will spend most of its time using network I/O. And in general, the GIL is released during I/O.

    But if you are doing calculations in your worker functions (i.e. executing Python bytecode), you should use a ProcessPoolExecutor instead.

    If you use a ProcessPoolExecutor and you are running on ms-windows, then using the if __name__ == '__main__' wrapper is required, because Python has to be able to import your main program without side effects in that case.