Search code examples
rustrust-polars

Rust Polars DataFrame Split - ColumnNotFound Error


I am working with Rust Polars and trying to split a column in a DataFrame. However, I encounter a ColumnNotFound error when attempting to apply the split operation. Here is my code:

use polars::prelude::*;

fn main() {
    test_split()
}

fn test_split() {
    let df = df!(
        "c1" => &["v1,v2", "v3", "v4"],
        "c2" => &["v5", "v6,v7,v8", "v9"],
    ).unwrap();

    println!("{}", df);

    let lf = df
        .clone()
        .lazy()
        .with_columns([
            col("c1").str().split(",".into())
        ])
        .collect().unwrap();

    println!("{}", lf);
}

The error message is:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ColumnNotFound(ErrString(",\n\nError originated just after this operation:\nDF [\"c1\", \"c2\"]; PROJECT */2 COLUMNS; SELECTION: \"None\""))', src/main.rs:114:20
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The output is:

shape: (3, 2)
┌───────┬──────────┐
│ c1    ┆ c2       │
│ ---   ┆ ---      │
│ str   ┆ str      │
╞═══════╪══════════╡
│ v1,v2 ┆ v5       │
│ v3    ┆ v6,v7,v8 │
│ v4    ┆ v9       │
└───────┴──────────┘

Could someone help me understand what is causing this error and how to split a column correctly in Rust Polars?

I expect the split result like:

shape: (3, 2)
┌──────────┬──────────┐
│ c1       ┆ c2       │
│ ---      ┆ ---      │
│ str      ┆ str      │
╞══════════╪══════════╡
│ [v1, v2] ┆ v5       │
│ [v3]     ┆ v6,v7,v8 │
│ [v4]     ┆ v9       │
└──────────┴──────────┘

Additional information:

polars = { version = "0.33.2", features = ["dtype-categorical", "lazy", "strings"] }
rustc 1.72.0 (5680fa18f 2023-08-23)

Solution

  • Changing:

    col("c1").str().split(",".into())
    

    To:

    col("c1").str().split(lit(","))
    

    Will fix this issue.

    Reference: https://github.com/pola-rs/polars/issues/11180