Search code examples
amazon-web-servicesrustaws-lambdarust-tokiorust-polars

Using Polars Cloud Storage from within AWS Lambda


I've got an AWS Lambda written in Rust, using the Rust Lambda Runtime. Within that Lambda, I'd like to use Polars to lazily load a Parquet file from S3 and perform some transformations on it before writing it back to another S3 bucket.

The issue I'm having (at least, I think this is the issue) is that the Polars Cloud Storage implementation seems to use tokio's block_on method, but the Lambda runtime already uses a tokio runner so I'm getting the following error:

Cannot start a runtime from within a runtime. This happens because a function (like `block_on`) attempted to block the current thread while the thread is being used to drive asynchronous tasks.

The code I'm using to lazily load the Parquet file is as follows:

let path = "s3://my_bucket/example.parquet"

let args = ScanArgsParquet::default();
match LazyFrame::scan_parquet(path, args) {
    Ok(lf) => lf,
    Err(_) => return Err(ReadError::ParquetError),
}

I'm relatively new to Rust, so is there anything I can do to work around this? Or am I going to have to download the file into memory myself using the SDK and then load it (in a non-lazy fashion)?


Solution

  • Based on the comment from @Chayim Friedman on the original post, here's what I got working in the end using tokio's task::spawn_blocking:

    let res = task::spawn_blocking(move || {
        let args = ScanArgsParquet::default();
        match LazyFrame::scan_parquet(path, args) {
            Ok(lf) => Ok(lf),
            Err(e) => Err(e),
        }
    }).await;
    
    match res {
        Ok(r) => match r {
            Ok(lf) => Ok(lf),
            Err(_) => Err(ReadError::ParquetError)
        },
        Err(_) => Err(ReadError::ThreadError)
    }