I have a scraper project that works with the asynchronous requests asks library and trio. I would like to chose how many concurrent tasks are made based on input, but my code is long and primitive
I use trio's spawning and nursery object for concurrent tasks (docs: https://trio.readthedocs.io/en/latest/reference-core.html)
Here's my sloppy code:
import trio
import asks
Number_of_workers = input("how many workers do you want?: ") #How many tasks I want between 1 and 5
async def child1(s):
r = await s.get("https://example.com", params={"example":"example"})
print("do something with", r.text)
async def child2():
r = await s.get("https://example.com", params={"example":"example"})
print("do something with", r.text)
async def child3():
r = await s.get("https://example.com", params={"example":"example"})
print("do something with", r.text)
async def child4():
r = await s.get("https://example.com", params={"example":"example"})
print("do something with", r.text)
async def child5():
r = await s.get("https://example.com", params={"example":"example"})
print("do something with", r.text)
async def parent():
s = Session(connections=5)
async with trio.open_nursery() as nursery:
if int(Number_of_workers) == 1:
nursery.start_soon(child1, s)
elif int(Number_of_workers) == 2:
nursery.start_soon(child1, s)
nursery.start_soon(child2, s)
elif int(Number_of_workers) == 3:
nursery.start_soon(child1, s)
nursery.start_soon(child2, s)
nursery.start_soon(child3, s)
elif int(Number_of_workers) == 4:
nursery.start_soon(child1, s)
nursery.start_soon(child2, s)
nursery.start_soon(child3, s)
nursery.start_soon(child4, s)
elif int(Number_of_workers) == 5:
nursery.start_soon(child1, s)
nursery.start_soon(child2, s)
nursery.start_soon(child3, s)
nursery.start_soon(child4, s)
nursery.start_soon(child5, s)
trio.run(parent)
I think you can understand where I'm getting at, this example code theoritically works, but it's very long for something that could probably be cut down to way less lines of code.
This kind of scheme gets especially long when dealing with 10 or 20 workers, and is always limited to a predefined amount.
Within of itself, each child is the same, same code, it just gets different data (such as the params, and the url) from an external module .py file with importlib.
Is there a way to cut this down to a more optimized code?
You can use a loop!
async def child(s):
r = await s.get("https://example.com", params={"example":"example"})
print("do something with", r.text)
async def parent():
s = Session(connections=5)
async with trio.open_nursery() as nursery:
for i in range(Number_of_workers):
nursery.start_soon(child, s)
Edit: here's a self-contained demo you can run to convince yourself that this does in fact run concurrent tasks. It also demonstrates how you can pass different parameter values to the different tasks, so they do different things – in this case, print different messages:
import trio
Number_of_workers = 10
async def child(i):
print("child {}: started".format(i))
await trio.sleep(5)
print("child {}: finished".format(i))
async def parent():
async with trio.open_nursery() as nursery:
for i in range(Number_of_workers):
nursery.start_soon(child, i)
trio.run(parent)
Try it and see!