Search code examples
async-awaitpython-requests-html

How can I build a list of async tasks with argument for AsyncHTMLSession().run?


From the documentation I have this example I've tested and works..

from requests_html import AsyncHTMLSession

asession = AsyncHTMLSession()

async def get_pythonorg():
    r = await asession.get('https://python.org/')

async def get_reddit():
    r = await asession.get('https://reddit.com/')

async def get_google():
    r = await asession.get('https://google.com/')

result = asession.run(get_pythonorg, get_reddit, get_google)

But what if my urls are variable? I'd like to do this..

from requests_html import AsyncHTMLSession

urls = ('https://python.org/', 'https://reddit.com/', 'https://google.com/')

asession = AsyncHTMLSession()

async def get_url(url):
    r = await asession.get(url)

tasks = []
for url in urls:
    tasks.append(get_url(url=url))

result = asession.run(*tasks)

but I get..

Traceback (most recent call last):   File "./test.py", line 17, in <module>
    result = asession.run(*tasks)   File "/home/deanresin/.local/lib/python3.7/site-packages/requests_html.py", line 772, in run
    asyncio.ensure_future(coro()) for coro in coros   File "/home/deanresin/.local/lib/python3.7/site-packages/requests_html.py", line 772, in <listcomp>
    asyncio.ensure_future(coro()) for coro in coros TypeError: 'coroutine' object is not callable sys:1: RuntimeWarning: coroutine 'get_url' was never awaited

Solution

  • TLTR:

    It is because you are passing coroutines objects and not coroutines functions.

    You can do:

    from requests_html import AsyncHTMLSession
    
    urls = ('https://python.org/', 'https://reddit.com/', 'https://google.com/')
    
    asession = AsyncHTMLSession()
    
    async def get_url(url):
        r = await asession.get(url)
        # if you want async javascript rendered page:
        await r.html.arender() 
        return r
    
    all_responses = asession.run(*[lambda url=url: get_url(url) for url in urls])
    

    Explanations:

    The error is coming from result = asession.run(*tasks) so let's see the source code of AsyncHTMLSession.run() :

    def run(self, *coros):
        """ Pass in all the coroutines you want to run, it will wrap each one
            in a task, run it and wait for the result. Return a list with all
            results, this is returned in the same order coros are passed in. """
        tasks = [
            asyncio.ensure_future(coro()) for coro in coros
        ]
        done, _ = self.loop.run_until_complete(asyncio.wait(tasks))
        return [t.result() for t in done]
    

    So in the following list comprehension you are normally passing a callable coroutine function and not coroutine object

    tasks = [
            asyncio.ensure_future(coro()) for coro in coros
        ]
    

    But you then in your error you have for coro in coros TypeError: 'coroutine' object is not callable.
    So you are passing a list of coroutines objects and not coroutines functions.

    Indeed when you are doing this:

    tasks = []
    for url in urls:
        tasks.append(get_url(url=url))
    

    You are making a list of coroutines objects by calling your coroutine function.

    So in order to make a list of coroutines functions you can use lambda function like this:

    [lambda url=url: get_url(url) for url in urls]
    

    Note the url=url in order to make the url parameter accessed when the lambda is defined.
    More informations about this here.