Hi i'm really new to threading and it's making me confused, how can i run this code in parallel ?
def search_posts(page):
page_url = f'https://jsonplaceholder.typicode.com/posts/{page}'
req = requests.get(page_url)
res = req.json()
title = res['title']
return title
page = 1
while True:
with ThreadPoolExecutor() as executer:
t = executer.submit(search_posts, page)
title = t.result()
print(title)
if page == 20:
break
page += 1
Another question is do i need to learn operating systems in order to understand how does threading work?
The problem here is that you are creating a new ThreadPoolExecutor
for every page. To do things in parallel, create only one ThreadPoolExecutor
and use its map
method:
import concurrent.futures as cf
import requests
def search_posts(page):
page_url = f'https://jsonplaceholder.typicode.com/posts/{page}'
res = requests.get(page_url).json()
return res['title']
if __name__ == '__main__':
with cf.ThreadPoolExecutor() as ex:
results = ex.map(search_posts, range(1, 21))
for r in results:
print(r)
Note that using the if __name__ == '__main__'
wrapper is a good habit in making your code more portable.
One thing to keep in mind when using threads;
If you are using CPython (the Python implementation from python.org
which is the most common one), threads don't actually run in parallel.
To make memory management less complicated, only one thread at a time can be executing Python bytecode in CPython. This is enforced by the Global Interpreter Lock ("GIL") in CPython.
The good news is that using requests
to get a web page will spend most of its time using network I/O. And in general, the GIL is released during I/O.
But if you are doing calculations in your worker functions (i.e. executing Python bytecode), you should use a ProcessPoolExecutor
instead.
If you use a ProcessPoolExecutor
and you are running on ms-windows, then using the if __name__ == '__main__'
wrapper is required, because Python has to be able to import
your main program without side effects in that case.