I'm trying to replicate one of the Polars Python examples in Rust but seem to have hit a wall. In the Python docs there is an example which creates a new column with the lengths of the strings from another column. So for example, column B will contain the lengths of all the strings in column A.
The example code looks like this:
import polars as pl
df = pl.DataFrame({"shakespeare": "All that glitters is not gold".split(" ")})
df = df.with_column(pl.col("shakespeare").str.lengths().alias("letter_count"))
As you can see it uses the str namespace to access the lengths()
function but when trying the same in the Rust version this does not work:
use polars::prelude::*;
// This will throw the following error:
// no method named `lengths` found for struct `StringNameSpace` in the current scope
fn print_length_strings_in_column() -> () {
let df = generate_df().expect("error");
let new_df = df
.lazy()
.with_column(col("vendor_id").str().lengths().alias("vendor_id_length"))
.collect();
}
Cargo.toml:
[dependencies]
polars = {version = "0.22.8", features = ["strings", "lazy"]}
I checked the docs and it seems like the Rust version of Polars does not implement the lengths()
function. There is the str_lengths
function in the Utf8NameSpace but it's not entirely clear to me how to use this.
I feel like I'm missing something very simple here but I don't see it. How would i go about tackling this issue?
Thanks!
You have to use apply function and cast the series to Utf8 Chunked Array. It then has a method str_lengths()
:
https://docs.rs/polars/0.22.8/polars/chunked_array/struct.ChunkedArray.html
let s = Series::new("vendor_id", &["Ant", "no", "how", "Ant", "mans"]);
let df = DataFrame::new(vec![s]).unwrap();
let res = df.lazy()
.with_column(col("vendor_id").apply(|srs|{
Ok(srs.utf8()?
.str_lengths()
.into_series())
}, GetOutput::from_type(DataType::Int32))
.alias("vendor_id_length"))
.collect();