Within trio/anyio
, is it possible to pause the tasks until i do specific operation and then continue all of it.
Let's say that i run specific function to obtain a valid cookie and then i start to crawl a website, But after sometimes this cookie got expired and i would need to run the previous function again to obtain a new cookie.
so if spawn 10 tasks under the nursery and during that the cookie got expired while 6 tasks is running! so how i can pause all of them and run this function only one time ?
import trio
import httpx
async def get_cookies(client):
# Let's say that here i will use a headless browser operation to obtain a valid cookie.
pass
limiter = trio.CapacityLimiter(20)
async def crawler(client, url, sender):
async with limiter, sender:
r = await client.get(url)
if "something special happen" in r.text:
pass
# here i want to say if my cookie got expired,
# Then i want to run get_cookies() only one time .
await sender.send(r.text)
async def main():
async with httpx.AsyncClient() as client, trio.open_nursery() as nurse:
await get_cookies(client)
sender, receiver = trio.open_memory_channel(0)
nurse.start_soon(rec, receiver)
urls = []
async with sender:
for url in urls:
nurse.start_soon(crawler, client, sender.clone())
async def rec(receiver):
async with receiver:
for i in receiver:
print(i)
if __name__ == "__main__":
trio.run(main)
You simply wrap get_cookies
in an async with some_lock
block. In that block, if you already have a cookie (let's say it's a global variable) you return it, otherwise you acquire one and then set the global.
When you notice that the cookie has expired, you delete it (i.e. set the global back to None
) and call get_cookies
.
In other words, something along these lines:
class CrawlData:
def __init__(self, client):
self.client = client
self.valid = False
self.lock = trio.Lock()
self.limiter = trio.CapacityLimiter(20)
async def get_cookie(self):
if self.valid:
return
async with self.lock:
if self.valid:
return
... # fetch cookie here, using self.client
self.valid = True
async def get(self, url):
r = await self.client.get(url)
if check_for_expired_cookie(r):
await self.get_cookie()
r = await self.client.get(url)
if check_for_expired_cookie(r):
raise RuntimeError("New cookie doesn't work", r)
return r
async def crawler(data, url, sender):
async with data.limiter, sender:
r = await data.get(url)
await sender.send(r.text)
async def main():
async with httpx.AsyncClient() as client, trio.open_nursery() as nurse:
data = CrawlData(client)
sender, receiver = trio.open_memory_channel(0)
nurse.start_soon(rec, receiver)
urls = []
async with sender:
for url in urls:
nurse.start_soon(crawler, client, sender.clone())
...