Search code examples
web-scrapingasync-awaitpython-requeststwistedpython-3.7

async twisted, synchronous requests per domain (with delay)


let's say i have 10 domains, but every domain need to have delay between requests (to avoid dos situations and ip-banning).

I was thinking about async twisted that call a class, requests from requests module have delay(500) , but then another request to the same domain make it delay(250) and so on, and so on.

How to achive that static delay, and store somewhere something like queue for every domain (class) ?

It's custom web scraper, twisted is TCP but this shouldn't make difference. I don't want the code, but knowledge.


Solution

  • while using asyncio for async,

    import asyncio
    
    async def nested(x):
        print(x)
        await asyncio.sleep(1)
    
    
    async def main():
        # Schedule nested() to run soon concurrently
        # with "main()".
        for x in range(100):
            await asyncio.sleep(1)
            task = asyncio.create_task(nested(x))
            # "task" can now be used to cancel "nested()", or
            # can simply be awaited to wait until it is complete:
            await task
    
    
    
    
    asyncio.run(main())
    

    with await in main, it will print every 2s,

    without await in nasted, it will print every 1s.

    without await task in main, it will print every 0s, even asyncio.sleep is declared.

    It is totally hard to maintain if we are new in async.