I have a column name that can change its prefix and suffix based on some function arguments, but there is a section of the column name that is always the same. I need to rename that column to something easy for reference in a different workflow. I am in search of the quickest way to find the column I am looking for and rename it to my desired name.
I am using a for loop to check if the part of the string is in each column, but I don't think that this is the most performant way to rename a column based on regex filtering.
This is what I have come up with:
data = pl.DataFrame({
"foo": [1, 2, 3, 4, 5],
"bar": [5, 4, 3, 2, 1],
"std_volatility_pct_21D": [0.1, 0.2, 0.15, 0.18, 0.16]
})
for col in data.columns:
if "volatility_pct" in col:
new_data = data.rename({col: "realized_volatility"})
import polars as pl
import polars.selectors as cs
data = pl.DataFrame(
{
"foo": [1, 2, 3, 4, 5],
"bar": [5, 4, 3, 2, 1],
"std_volatility_pct_21D": [0.1, 0.2, 0.15, 0.18, 0.16],
}
)
# 1
def rename_volatility_column(data):
for col in data.columns:
if "volatility_pct" in col:
return data.rename({col: "realized_volatility"})
return data
%timeit rename_volatility_column(data)
# 2
def adjust_volatility_column(data):
return data.select(
~cs.contains("volatility_pct"),
cs.contains("volatility_pct").alias("realized_volatility"),
)
%timeit adjust_volatility_column(data)
# 3
%timeit data.rename(lambda col: "realized_volatility" if "volatility_pct" in col else col)
#1
18.8 µs ± 636 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
#2
330 µs ± 11.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
#3
133 µs ± 7.71 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
You can use polars' column selectors.
~cs.contains("volatility_pct")
selects all column that do not contain volatility_pct
cs.contains("volatility_pct").alias("realized_volatility")
selects all columns that contain volatility_pct
and renames them to realized_volatility
import polars.selectors as cs
(
data
.select(
~cs.contains("volatility_pct"),
cs.contains("volatility_pct").alias("realized_volatility"),
)
)