rustrust-polars# How can I create a new column in a dataframe by using a function in Rust (Polars)?

I have two functions below. The first function is called unlevered_beta_f, and the second function is called industry_total_beta_f. The second function uses polars from rust which helps me to manipulate a DataFrame that is being read from a CSV file. I want to create a new column using the first function, but I am not quite sure how to do it successfully.

```
pub fn unlevered_beta_f(
levered_beta: f32,
de_ratio: f32,
marginal_tax_rate: Option<f32>,
effective_tax_rate: f32,
cash_firm_value: f32,
) -> Option<f32> {
// Do you want to use marginal or effective tax trates in unlevering betas?
// if marginal tax rate, enter the marginal tax rate to use
let tax_rate = tax_rate_f(marginal_tax_rate, effective_tax_rate);
let mut unlevered_beta = levered_beta / (1.0 + (1.0 - tax_rate) * de_ratio);
unlevered_beta = unlevered_beta / (1.0 - cash_firm_value);
return Some(unlevered_beta);
}
```

```
pub fn industry_total_beta_f(raw_data: DataFrame) -> DataFrame {
let df = raw_data
.clone()
.lazy()
.with_columns([unlevered_beta_f(
col("Average of Beta"),
col("Sum of Total Debt incl leases (in US $)") / col("Sum of Market Cap (in US $)"),
marginal_tax_rate = marginal_tax_rate,
col("Average of Effective Tax Rate"),
col("Sum of Cash") / col("Sum of Firm Value (in US $)"),
)
.alias("Average Unlevered Beta")])
.with_columns([
(col("Average Unlevered Beta") / col("Average of Correlation with market"))
.alias("Total Unlevered Beta"),
(col("Average of Beta") / col("Average of Correlation with market"))
.alias("Total Levered Beta"),
])
.select([
col("Industry Name"),
col("Number of firms"),
col("Average Unlevered Beta"),
col("Average of Beta"),
col("Average of Correlation with market"),
col("Total Unlevered Beta"),
col("Total Levered Beta"),
])
.collect()
.unwrap();
return df;
}
```

I tried the code above, but everything works except for the following section of the code:

```
.with_columns([unlevered_beta_f(
col("Average of Beta"),
col("Sum of Total Debt incl leases (in US $)") / col("Sum of Market Cap (in US $)"),
marginal_tax_rate = marginal_tax_rate,
col("Average of Effective Tax Rate"),
col("Sum of Cash") / col("Sum of Firm Value (in US $)"),
)
.alias("Average Unlevered Beta")])
```

I want to create a column called "Average Unlevered Beta", which takes the following columns as inputs obtained from a CSV file. In the other section of the code, I successfully created a new column, but I am not quite sure how to do it using a function.

Solution

A general remark: if you can make use of the polars Expression system, do that instead. It result in much more readable code, and is also slightly more performant for larger number of records (I did some quick benchmarks, see below).

If you can't (because, for example, the `tax_rate_f`

function in your example is not expressible as a polars Expression), then you can apply a function to a subset of columns via the `as_struct`

in combination with `map`

, as explained in another SO question. Note that I'm making use here of a third party dependency, `itertools`

, to easily iterate over multiple zipped iterators.

Based on the comments you included in your code, I assumed a very simple implementation of the `tax_rate_f`

function. I then also implemented both the `unlevered_beta_f`

and `tax_rate_f`

as polars Expression functions, to show the difference in complexity.

```
use itertools::izip;
use polars::{
lazy::dsl::{as_struct, GetOutput},
prelude::*,
};
use rand::{distributions::Uniform, Rng};
const NUMBER_OF_RECORES: usize = 10000;
pub fn unlevered_beta_f(
levered_beta: f32,
de_ratio: f32,
marginal_tax_rate: Option<f32>,
effective_tax_rate: f32,
cash_firm_value: f32,
) -> Option<f32> {
// Do you want to use marginal or effective tax trates in unlevering betas?
// if marginal tax rate, enter the marginal tax rate to use
let tax_rate = tax_rate_f(marginal_tax_rate, effective_tax_rate);
let mut unlevered_beta = levered_beta / (1.0 + (1.0 - tax_rate) * de_ratio);
unlevered_beta /= 1.0 - cash_firm_value;
Some(unlevered_beta)
}
pub fn tax_rate_f(marginal_tax_rate: Option<f32>, effective_tax_rate: f32) -> f32 {
match marginal_tax_rate {
Some(marginal_tax_rate) => marginal_tax_rate,
None => effective_tax_rate,
}
}
pub fn tax_rate_f_expr(marginal_tax_rate: Expr, effective_tax_rate: Expr) -> Expr {
when(marginal_tax_rate.clone().is_not_null())
.then(marginal_tax_rate)
.otherwise(effective_tax_rate)
}
pub fn unlevered_beta_f_expr(
levered_beta: Expr,
de_ratio: Expr,
marginal_tax_rate: Expr,
effective_tax_rate: Expr,
cash_firm_value: Expr,
) -> Expr {
let tax_rate = tax_rate_f_expr(marginal_tax_rate, effective_tax_rate);
let unlevered_beta = levered_beta / (lit(1.0) + (lit(1.0) - tax_rate) * de_ratio);
unlevered_beta / (lit(1.0) - cash_firm_value)
}
fn main() -> Result<(), PolarsError> {
let df = get_df()?;
let enriched_df = df.clone().lazy().with_column(
as_struct(vec![
col("Average of Beta"),
col("Sum of Total Debt incl leases (in US $)"),
col("Sum of Market Cap (in US $)"),
col("Average of Effective Tax Rate"),
col("Sum of Cash"),
col("Sum of Firm Value (in US $)"),
])
.map(
|s| {
let cols = s.struct_()?;
let avg_beta = cols.field_by_name("Average of Beta")?;
let avg_beta = avg_beta.f32()?;
let sum_debt = cols.field_by_name("Sum of Total Debt incl leases (in US $)")?;
let sum_debt = sum_debt.f32()?;
let sum_mkt_cap = cols.field_by_name("Sum of Market Cap (in US $)")?;
let sum_mkt_cap = sum_mkt_cap.f32()?;
let avg_tax_rate = cols.field_by_name("Average of Effective Tax Rate")?;
let avg_tax_rate = avg_tax_rate.f32()?;
let sum_cash = cols.field_by_name("Sum of Cash")?;
let sum_cash = sum_cash.f32()?;
let sum_firm_value = cols.field_by_name("Sum of Firm Value (in US $)")?;
let sum_firm_value = sum_firm_value.f32()?;
let zipped_iterables = izip!(
avg_beta,
sum_debt,
sum_mkt_cap,
avg_tax_rate,
sum_cash,
sum_firm_value
);
let x: ChunkedArray<Float32Type> = zipped_iterables
.map(
|(
avg_beta,
sum_debt,
sum_mkt_cap,
avg_tax_rate,
sum_cash,
sum_firm_value,
)| {
if let (
Some(avg_beta),
Some(sum_debt),
Some(sum_mkt_cap),
Some(avg_tax_rate),
Some(sum_cash),
Some(sum_firm_value),
) = (
avg_beta,
sum_debt,
sum_mkt_cap,
avg_tax_rate,
sum_cash,
sum_firm_value,
) {
unlevered_beta_f(
avg_beta,
sum_debt / sum_mkt_cap,
None,
avg_tax_rate,
sum_cash / sum_firm_value,
)
} else {
None
}
},
)
.collect();
Ok(Some(x.into_series()))
},
GetOutput::from_type(DataType::Float32),
)
.alias("Average Unlevered Beta"),
);
println!("{:?}", enriched_df.collect());
let better_df = df
.clone()
.lazy()
.with_column(lit(NULL).alias("Marginal Tax Rate"))
.with_column(unlevered_beta_f_expr(
col("Average of Beta"),
col("Sum of Total Debt incl leases (in US $)") / col("Sum of Market Cap (in US $)"),
col("Marginal Tax Rate"),
col("Average of Effective Tax Rate"),
col("Sum of Cash") / col("Sum of Firm Value (in US $)"),
).alias("Average Unlevered Beta"))
.collect();
print!("{:?}", better_df);
Ok(())
}
```

I benchmarked both approaches using the `divan`

crate, and got the following result for 10M records:

As you can see, the approach using polar's Expression syntax is slightly faster. For smaller number of records, it's actually the other way around. I'm not familiar enough with the internals of polars to explain this observation. Do take these benchmarks with a grain of salt: the random DataFrame generation is part of the benchmark, but I assume the time spend is similar for both approaches.

- How do I return an associated type from a higher-ranked trait bound trait?
- Rust tuple assignment
- How to create a cyclic reference with Arc and Weak?
- How can I make a cloned Arc live long enough for a tokio task in Rust?
- System tray tooltips in Tauri
- Why does taking a static reference to a const return a reference to a temporary variable?
- How do you disable dead code warnings at the crate level in Rust?
- How to initialise a 16-byte array with a random value?
- Simpler way to convert array into HashMap
- Rust accessing Option from mutex
- How much overhead does RUST_BACKTRACE=1 have?
- Cargo lambda build does not find OpenSSL development headers
- Infinite "loop" or "while let" are the same in terms of performances?
- How to clone a struct storing a boxed trait object?
- How can I access a struct field within the "new" associated function?
- SIMD instruction emulation in WebAssembly?
- Assignment from function to variable defined as mutable reference?
- Calling Rust from C# with IntPtr fails on Enum (EntryPointNotFound)
- How to use an internal library Enum for Clap Args
- How do you idiomatically implement a nontrivial typestate pattern in Rust?
- How do I convert from an integer to a string?
- How to use serde serialize_with inside a custom Serializer for a struct
- Cargo `OUT_DIR` env var not found when run a specific test
- What's rust's idiomatic way of writing vector into a file (or as a string) with tabs between elements and newline at the end?
- What is the idiomatic way to write the sine of the negative of an angle in Rust?
- How to move an opaque value like a hot potato from one Enum constructor to the next to the next?
- Identity '[]' not found when calling get_credentials_for_identity AWS identity pool endpoint
- Is there any way to create a const &'static CStr?
- Change elements in vector using multithreading in Rust
- How to reduce flicker in terminal re-drawing?