I am trying to convert some pandas DataFrame operations to Polars in Python, but I am running into difficulties, particularly with row-wise operations and element-wise comparisons. Here is the pandas code I am working with:
df_a = pd.DataFrame({
"feature1": [1, 2, 3],
"feature2": [7, 8, 9],
})
df_b = pd.DataFrame({
"feature1": [3, 8, 2],
"feature2": [7, 4, 9],
})
if selection_mode == 'option1':
max_values = df_a.max(axis=1)
selected_features = df_a.eq(max_values, axis=0)
final_result = selected_features.mul(df_b).sum(axis=1) / selected_features.sum(axis=1)
elif selection_mode == 'option2':
above_avg = df_a.ge(df_a.mean(axis=1), axis=0)
combined_df = above_avg.mul(df_a).mul(df_b)
sum_combined = combined_df.sum(axis=1)
sum_above_avg = above_avg.mul(df_a).sum(axis=1)
final_result = sum_combined / sum_above_avg
Any guidance on translating this pandas code to Polars would be greatly appreciated!
Polars has dedicated horizontal functions for "row-wise" operations.
df_a.max_horizontal()
shape: (3,)
Series: 'max' [i64]
[
7
8
9
]
For DataFrames, Polars will "broadcast" the operation across all columns if the right-hand side is a Series.
df_a == df_a.max_horizontal() # df_a.select(pl.all() == pl.Series([7, 8, 9]))
shape: (3, 2)
┌──────────┬──────────┐
│ feature1 ┆ feature2 │
│ --- ┆ --- │
│ bool ┆ bool │
╞══════════╪══════════╡
│ false ┆ true │
│ false ┆ true │
│ false ┆ true │
└──────────┴──────────┘
max_values = df_a.max_horizontal()
selected_features = df_a == max_values
final_result = (
(selected_features * df_b).sum_horizontal() / selected_features.sum_horizontal()
)
above_avg = df_a >= df_a.mean_horizontal()
combined_df = above_avg * df_a * df_b
sum_combined = combined_df.sum_horizontal()
sum_above_avg = (above_avg * df_a).sum_horizontal()
final_result = sum_combined / sum_above_avg