Search code examples
pythonpython-3.xpandaspython-asyncio

Fill Pandas Dataframe asynchronously with async


I just saw this awesome video from Idently and tried to use the trick to fill in some dataframe columns according to another.

Here is my MWE (more like non-working example infact) code, I code in a Jupyter notebook.

import asyncio
import pandas as pd
import requests

mydf = pd.DataFrame({'url':['https://google.com','https://apple.com']})
print(mydf)
print("-----")

async def fetch_status(url:str) -> int:
    response = await asyncio.to_thread(requests.get,url)
    return(response.status_code)

async def main_task() -> None:
    myTask = asyncio.create_task(mydf['url'].apply(fetch_status))
    mydf['status'] = await myTask
   
    print(mydf)

In a separate cell:

asyncio.run(main = main_task())

I get a RuntimeError: asyncio.run() cannot be called from a running event loop error.
Any idea why? Any help is welcome.


Solution

  • Split and fix your code like so:

    Cell 1:

    import asyncio
    from asyncio import Task
    import pandas as pd
    import requests
    
    mydf = pd.DataFrame({'url':['http://google.com','http://apple.com']})
    print(mydf)
    
    async def fetch_status(url:str) -> int:
        response = await asyncio.to_thread(requests.get,url,None)
        return(response.status_code)
    
    async def main_task() -> None:
        mydf['status'] = await asyncio.gather(*[fetch_status(url) for url in mydf['url']])
    
    

    Cell 2:

    await main_task()