I have the following async block of code that runs as part of a larger program, and it runs successfully when the dataframe has a row with length 10, or 30, but when i put it to a larger number like 300, it tries to write the dataframe as parquet and throws a runtime error for each async thread:
Here is an example of the df it tries to write but fails.
df: Ok(shape: (300, 4)
┌───────────────┬─────────┬────────────────────────────────┬───────────────────────────────────────┐
│ timestamp ┆ ticker ┆ bids ┆ asks │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ list[list[f64]] ┆ list[list[f64]] │
╞═══════════════╪═════════╪════════════════════════════════╪═══════════════════════════════════════╡
│ 1674962575119 ┆ ETHUSDT ┆ [[1589.51, 4.731], [1589.31, ┆ [[1590.93, 39.234], [1592.1, 51.... │
│ ┆ ┆ 93.... ┆ │
│ 1674962575220 ┆ ETHUSDT ┆ [[1589.51, 22.094], [1589.31, ┆ [[1590.93, 39.234], [1592.1, 51.... │
│ ┆ ┆ 24... ┆ │
│ 1674962575319 ┆ ETHUSDT ┆ [[1589.51, 12.324], [1589.31, ┆ [[1590.93, 39.309], [1592.1, 52.... │
│ ┆ ┆ 24... ┆ │
│ 1674962575421 ┆ ETHUSDT ┆ [[1589.51, 0.0], [1589.31, ┆ [[1590.93, 26.735], [1592.1, 52.... │
│ ┆ ┆ 24.26... ┆ │
│ ... ┆ ... ┆ ... ┆ ... │
│ 1674962604998 ┆ ETHUSDT ┆ [[1440.0, 5138.446], [1558.38, ┆ [[1617.28, 40.969], [1593.72, 3.... │
│ ┆ ┆ 0... ┆ │
│ 1674962605101 ┆ ETHUSDT ┆ [[1440.0, 5138.446], [1558.38, ┆ [[1617.28, 40.969], [1593.72, 3.... │
│ ┆ ┆ 0... ┆ │
│ 1674962605201 ┆ ETHUSDT ┆ [[1440.0, 5138.446], [1558.38, ┆ [[1617.28, 40.969], [1593.72, 3.... │
│ ┆ ┆ 0... ┆ │
│ 1674962605301 ┆ ETHUSDT ┆ [[1440.0, 5138.446], [1558.38, ┆ [[1617.28, 40.969], [1593.72, 3.... │
│ ┆ ┆ 0... ┆ │
└───────────────┴─────────┴────────────────────────────────┴───────────────────────────────────────┘)
Here is the error.
thread '<unnamed>' panicked at 'range end index 131373 out of range for slice of length 301', C:\Users\username\.cargo\git\checkouts\arrow2-945af624853845da\baa2618\src\io\parquet\write\mod.rs:171:37
thread '<unnamed>' panicked at 'range end index 131373 out of range for slice of length 301', C:\Users\username\.cargo\git\checkouts\arrow2-945af624853845da\baa2618\src\io\parquet\write\mod.rs:171:37
Here is the code.
async {
if main_vec.length() >= ROWS {
let df = main_vec.to_df();
let time = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.unwrap()
.as_secs();
let file = std::fs::File::create(&(time.to_string().trim() + ".parquet")).unwrap();
println!("Wrote parquet file: {}", &(time.to_string().trim().to_owned() + ".parquet"));
// ERROR OCCURS HERE
// ERROR OCCURS HERE
// ERROR OCCURS HERE
ParquetWriter::new(file)
.with_compression(ParquetCompression::Snappy)
.with_statistics(true)
.finish(&mut df.collect().unwrap())
.expect("Failed to write parquet file");
main_vec.clear();
}
}.await;
This is patched in the latest update of Polars