Search code examples
pythonnumpygdalpython-asyncio

why can't i await readasarray method in gdal module?


I am trying to read several remote images into python and read those image as numpyarray, I try to consider using async to boost my workflow, but I get an error like this:type error: object numpy.ndarray can't be used in 'await' expression',I wonder is it because the method readasarray is not async, so if I have to make it async, I will have to rewrite this method by my own.here are some of my code:

async def taskIO_1():

    in_ds = gdal.Open(a[0])
    data1 = await in_ds.GetRasterBand(1).ReadAsArray()

async def taskIO_2():

    in_ds = gdal.Open(a[1])
    data2 = await in_ds.GetRasterBand(1).ReadAsArray()

async def main():

    tasks = [taskIO_1(), taskIO_2()]
    done,pending = await asyncio.wait(tasks)
    for r in done:
        print(r.result())

if __name__ == '__main__':
    start = time.time()
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(main())
    finally:
        loop.close()
    print(float(time.time()-start))

Solution

  • Your notion is correct: In general, library functions are executed in a synchronized (blocking) fashion, unless the library is explicitly written to support asynchronous execution (e.g. by using non-blocking I/O), such as aiofiles or aiohttp.

    To use synchronous calls that you want to be executed asynchronously, you could use loop.run_in_executor. This does nothing else than to offload the computation into a separate thread or process and wrap it so it behaves like a coroutine. An example is shown here:

    import asyncio
    import concurrent.futures
    
    def blocking_io():
        # File operations (such as logging) can block the
        # event loop: run them in a thread pool.
        with open('/dev/urandom', 'rb') as f:
            return f.read(100)
    
    def cpu_bound():
        # CPU-bound operations will block the event loop:
        # in general it is preferable to run them in a
        # process pool.
        return sum(i * i for i in range(10 ** 7))
    
    async def main():
        loop = asyncio.get_running_loop()
    
        ## Options:
    
        # 1. Run in the default loop's executor:
        result = await loop.run_in_executor(
            None, blocking_io)
        print('default thread pool', result)
    
        # 2. Run in a custom thread pool:
        with concurrent.futures.ThreadPoolExecutor() as pool:
            result = await loop.run_in_executor(
                pool, blocking_io)
            print('custom thread pool', result)
    
        # 3. Run in a custom process pool:
        with concurrent.futures.ProcessPoolExecutor() as pool:
            result = await loop.run_in_executor(
                pool, cpu_bound)
            print('custom process pool', result)
    
    asyncio.run(main())
    

    However, if your application is not using any truly asynchronous features, you are probably better off to just use concurrent.futures pool directly and achieve concurrency that way.