Search code examples
pythonparsingasynchronouspython-asynciopython-requests-html

AsyncHTMLSession returns responses list disorderly! How to sort or make list ordered?


I'm found async requests-html much more useful than simple requests for parsing with using BeautifulSoup. But results when I'm using function asession.run for my async functions return responses in disordered way and It's alright if I will make some dictionary for async function which give me response with url as a key to make it sorted but it's looks redundant in my mind. Any ideas?

Here I'm expecting correct order of responses where it's at least not random in every new function call:

from requests_html import AsyncHTMLSession, HTMLSession, HTMLResponse
from bs4 import BeautifulSoup

asession = AsyncHTMLSession()

async def kucoin():
    print(f'get K')
    r = await asession.get('https://kucoin.com')
    return r

async def gateio():
    print(f'get g')
    r = await asession.get('https://gate.io')
    return r

async def vk():
    print(f'get vk')
    r = await asession.get('https://vk.com')
    return r


tasks = [kucoin, gateio, vk]
results = asession.run(*tasks)
for result in results:
    print(BeautifulSoup(result.text).title)`

But getting:

get K
get g
get vk
<title>Buy/Sell Bitcoin, Ethereum | Cryptocurrency Exchange | Gate.io</title>
<title>Crypto Exchange | Bitcoin Exchange | Bitcoin Trading | KuCoin</title>
<title>Welcome | VK</title>

If you have experience in async parsing I would be thankful if you will share with me your experience!

UPDATE: Found that it's normal in this lib to return disordered responses https://github.com/psf/requests-html/issues/381


Solution

  • In AsyncHTMLSession.run, done is a set (which is unordered).

    You can replace the implementation to return result from tasks:

    def run(self, *coros):
        tasks = [asyncio.ensure_future(coro()) for coro in coros]
        done, _ = self.loop.run_until_complete(asyncio.wait(tasks))
        # return [t.result() for t in done]
        return [t.result() for t in tasks]
    
    AsyncHTMLSession.run = run