Search code examples
pythonpython-asynciopython-3.7quartasgi

Python asyncio skip processing untill function return


I'm still very confused about how asyncio works, so I was trying to set a simple example but couldn't achieve it.

The following example is a web server (Quart) that receives a request to generate a large PDF, the server then returns a response before start processing the PDF, then starts processing it and will send the download link to an email later.

from quart import Quart
import asyncio
import time

app = Quart(__name__)

@app.route('/')
async def pdf():
    t1 = time.time()
    await generatePdf()
    return 'Time to execute : {} seconds'.format(time.time() - t1)

async def generatePdf():
    await asyncio.sleep(5)
    #sync generatepdf
    #send pdf link to email

app.run()

How would I go about this? in the above example I don't want the 5 seconds to be waited before the return.

I'm not even sure if asyncio is what I need.

And I'm afraid that blocking the server app after the response has returned is not a thing that should be done, but not sure either.

Also the pdf library is synchronous, but I guess that's a problem for another day...


Solution

  • The comment has everything you need to respond to the web request and schedule the pdf generation for later.

    asyncio.create_task(generatePdf())
    

    However it is not a good idea if the pdf processing is slow as it will block the asyncio event thread. i.e. The current request will be responded quickly but the following request will have to wait till the pdf generation is complete.

    The correct way would be run the task in an executor (especially ProcessPoolExecutor).

    from quart import Quart
    import asyncio
    import time
    from concurrent.futures import ProcessPoolExecutor
    
    app = Quart(__name__)
    executor = ProcessPoolExecutor(max_workers=5)
    
    @app.route('/')
    async def pdf():
        t1 = time.time()
        asyncio.get_running_loop().run_in_executor(executor, generatePdf)
        # await generatePdf()
        return 'Time to execute : {} seconds'.format(time.time() - t1)
    
    def generatePdf():
        #sync generatepdf
        #send pdf link to email
    
    app.run()
    

    It is important to note that since, it is running in different process, the generatePdf cannot access any data without synchronization. So pass everything the function needs when calling the function.


    Update

    If you can refactor the generatePdf function and make it async, it works best.

    Example if the generate pdf looks like

    def generatePdf():
        image1 = downloadImage(image1Url)
        image2 = downloadImage(image2Url)
        data = queryData()
        pdfFile = makePdf(image1, image2, data)
        link = upLoadToS3(pdfFile)
        sendEmail(link)
    

    You can make the function async like:

    async def generatePdf():
        image1, image2, data = await asyncio.gather(downloadImage(image1Url), downloadImage(image2Url), queryData())
        pdfFile = makePdf(image1, image2, data)
        link = await upLoadToS3(pdfFile)
        await sendEmail(link) 
    

    Note: All the helper functions like downloadImage, queryData need to be rewritten to support async. This way, requests won't be blocked even if the database or image servers are slow. Everything runs in the same asyncio thread.

    If some of them are not yet async, those can be used with run_in_executor and should work good with other async functions.