My problem can probably be described as being very new to both Rust and Polars. Go easy on me. :)
I'm trying to establish a pattern using custom functions, based on this documentation: https://pola-rs.github.io/polars-book/user-guide/dsl/custom_functions.html, however am so far unsuccessful.
In my code, I have a function declared as follows:
pub fn convert_str_to_tb(value: &str) -> f64 {
let value = value.replace(",", "");
let mut parts = value.split_whitespace();
let num = parts.next().unwrap().parse::<f64>().unwrap();
let unit = parts.next().unwrap();
match unit {
"KB" => num / (1000.0 * 1000.0 * 1000.0),
"MB" => num / (1000.0 * 1000.0),
"GB" => num / 1000.0,
"TB" => num,
_ => panic!("Unsupported unit: {}", unit),
}
}
I believe I should be able to call this function like so:
df.with_columns([
col("value").map(|s| Ok(convert_str_to_tb(s))).alias("value_tb");
])
My first issue was that with_columns method doesn't seem to exist - I had to use with_column. If I use the with_column, I receive the following error:
the trait bound `Expr: IntoSeries` is not satisfied
the following other types implement trait `IntoSeries`:
Arc<(dyn polars::prelude::SeriesTrait + 'static)>
ChunkedArray<T>
Logical<DateType, Int32Type>
Logical<DatetimeType, Int64Type>
Logical<DurationType, Int64Type>
Logical<TimeType, Int64Type>
polars::prelude::SeriesrustcClick for full compiler diagnostic
The DataFrame I am trying to transform:
let mut df = df!("volume" => &["volume01", "volume02", "volume03"],
"value" => &["1,000 GB", "2,000,000 MB", "3 TB"]).unwrap();
Perhaps there is a way to do this without a custom function?
Problem 1, with_columns
One confusing note that should be made about the documentation - the df
in the example is a lazy data frame. You can see they call .lazy()
in the full code snippet where a custom function is used. .with_columns()
is an available method on the lazy data frame.
Problem 2, custom function
You have some typing issues around what is expected in the custom function and what you have defined. You are expecting a str input and outputting a f64. However, as the error implies the s
parameter is actually a Series
and the expectation is that the returned value is an Option<Series>
.
So what's happening here? The .map()
function is providing you with a series that your custom function needs to iterate over.
Updating your custom function to have the appropriate arg and return type:
pub fn convert_str_to_tb(value: Series) -> Option<Series> {
Some(value.iter().map(|v| {
let value = v.get_str().unwrap().replace(",", "");
let mut parts = value.split_whitespace();
let num = parts.next().unwrap().parse::<f64>().unwrap();
let unit = parts.next().unwrap();
match unit {
"KB" => num / (1000.0 * 1000.0 * 1000.0),
"MB" => num / (1000.0 * 1000.0),
"GB" => num / 1000.0,
"TB" => num,
_ => panic!("Unsupported unit: {}", unit),
}
}).collect())
}
And called using
df.lazy().with_columns([
col("value").map(|s| Ok(convert_str_to_tb(s)), GetOutput::default()).alias("value_tb")
]).collect().unwrap();
Gives the output:
shape: (3, 3)
┌──────────┬──────────────┬──────────┐
│ volume ┆ value ┆ value_tb │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 │
╞══════════╪══════════════╪══════════╡
│ volume01 ┆ 1,000 GB ┆ 1.0 │
│ volume02 ┆ 2,000,000 MB ┆ 2.0 │
│ volume03 ┆ 3 TB ┆ 3.0 │
└──────────┴──────────────┴──────────┘