Search code examples
dataframerustparquetrust-polars

Writing long rows using polars DataFrame throws runtime error


I have the following async block of code that runs as part of a larger program, and it runs successfully when the dataframe has a row with length 10, or 30, but when i put it to a larger number like 300, it tries to write the dataframe as parquet and throws a runtime error for each async thread:

Here is an example of the df it tries to write but fails.

df: Ok(shape: (300, 4)
┌───────────────┬─────────┬────────────────────────────────┬───────────────────────────────────────┐
│ timestamp     ┆ ticker  ┆ bids                           ┆ asks                                  │
│ ---           ┆ ---     ┆ ---                            ┆ ---                                   │
│ i64           ┆ str     ┆ list[list[f64]]                ┆ list[list[f64]]                       │
╞═══════════════╪═════════╪════════════════════════════════╪═══════════════════════════════════════╡
│ 1674962575119 ┆ ETHUSDT ┆ [[1589.51, 4.731], [1589.31,   ┆ [[1590.93, 39.234], [1592.1, 51....   │
│               ┆         ┆ 93....                         ┆                                       │
│ 1674962575220 ┆ ETHUSDT ┆ [[1589.51, 22.094], [1589.31,  ┆ [[1590.93, 39.234], [1592.1, 51....   │
│               ┆         ┆ 24...                          ┆                                       │
│ 1674962575319 ┆ ETHUSDT ┆ [[1589.51, 12.324], [1589.31,  ┆ [[1590.93, 39.309], [1592.1, 52....   │
│               ┆         ┆ 24...                          ┆                                       │
│ 1674962575421 ┆ ETHUSDT ┆ [[1589.51, 0.0], [1589.31,     ┆ [[1590.93, 26.735], [1592.1, 52....   │
│               ┆         ┆ 24.26...                       ┆                                       │
│ ...           ┆ ...     ┆ ...                            ┆ ...                                   │
│ 1674962604998 ┆ ETHUSDT ┆ [[1440.0, 5138.446], [1558.38, ┆ [[1617.28, 40.969], [1593.72, 3....   │
│               ┆         ┆ 0...                           ┆                                       │
│ 1674962605101 ┆ ETHUSDT ┆ [[1440.0, 5138.446], [1558.38, ┆ [[1617.28, 40.969], [1593.72, 3....   │
│               ┆         ┆ 0...                           ┆                                       │
│ 1674962605201 ┆ ETHUSDT ┆ [[1440.0, 5138.446], [1558.38, ┆ [[1617.28, 40.969], [1593.72, 3....   │
│               ┆         ┆ 0...                           ┆                                       │
│ 1674962605301 ┆ ETHUSDT ┆ [[1440.0, 5138.446], [1558.38, ┆ [[1617.28, 40.969], [1593.72, 3....   │
│               ┆         ┆ 0...                           ┆                                       │
└───────────────┴─────────┴────────────────────────────────┴───────────────────────────────────────┘)

Here is the error.

thread '<unnamed>' panicked at 'range end index 131373 out of range for slice of length 301', C:\Users\username\.cargo\git\checkouts\arrow2-945af624853845da\baa2618\src\io\parquet\write\mod.rs:171:37
thread '<unnamed>' panicked at 'range end index 131373 out of range for slice of length 301', C:\Users\username\.cargo\git\checkouts\arrow2-945af624853845da\baa2618\src\io\parquet\write\mod.rs:171:37

Here is the code.

async {
    if main_vec.length() >= ROWS {
        let df = main_vec.to_df();

        let time = std::time::SystemTime::now()
            .duration_since(std::time::UNIX_EPOCH)
            .unwrap()
            .as_secs();

        let file = std::fs::File::create(&(time.to_string().trim() + ".parquet")).unwrap();

        println!("Wrote parquet file: {}", &(time.to_string().trim().to_owned() + ".parquet"));

        //  ERROR OCCURS HERE

        //  ERROR OCCURS HERE

        //  ERROR OCCURS HERE

        ParquetWriter::new(file)
            .with_compression(ParquetCompression::Snappy)
            .with_statistics(true)
            .finish(&mut df.collect().unwrap())
            .expect("Failed to write parquet file");

        main_vec.clear();
    }
}.await;


Solution

  • This is patched in the latest update of Polars