I have the following dataframe:
import polars as pl
import numpy as np
df = pl.DataFrame({
"nrs": [1, 2, 3, None, 5],
"names_A0": ["foo", "ham", "spam", "egg", None],
"random_A0": np.random.rand(5),
"A_A2": [True, True, False, False, False],
})
digit = 0
For each column X whose name ends with the string suf =f'_A{digit}'
, I want to add an identical column to df
, whose name is the same as X, but without suf
.
In the example, I need to add columns names
and random
to the original dataframe df
, whose content is identical to that of columns names_A0
and random_A0
respectively.
You can you Polars Selectors along with some basic strings operations to accomplish this. Depending on what you how you expect this problem to evolve, you can jump straight to regular expressions, or use polars.selectors.ends_with/string.removesuffix
This approach uses
- polars.selectors.ends_with # find columns ending with string
- string.removesuffix # remove suffix from end of string
translating to
import polars as pl
from polars import selectors as cs
import numpy as np
import re
from functools import partial
df = pl.DataFrame(
{
"nrs": [1, 2, 3, None, 5],
"names_A0": ["foo", "ham", "spam", "egg", None],
"random_A0": np.random.rand(5),
"A_A2": [True, True, False, False, False],
}
)
digit = 0
suffix = f'_A{digit}'
print(
# keep original A0 columns
df.with_columns(
cs.ends_with(suffix).name.map(lambda s: s.removesuffix(suffix))
),
# shape: (5, 6)
# ┌──────┬──────────┬───────────┬───────┬───────┬──────────┐
# │ nrs ┆ names_A0 ┆ random_A0 ┆ A_A2 ┆ names ┆ random │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ str ┆ f64 ┆ bool ┆ str ┆ f64 │
# ╞══════╪══════════╪═══════════╪═══════╪═══════╪══════════╡
# │ 1 ┆ foo ┆ 0.713324 ┆ true ┆ foo ┆ 0.713324 │
# │ 2 ┆ ham ┆ 0.980031 ┆ true ┆ ham ┆ 0.980031 │
# │ 3 ┆ spam ┆ 0.242768 ┆ false ┆ spam ┆ 0.242768 │
# │ null ┆ egg ┆ 0.528783 ┆ false ┆ egg ┆ 0.528783 │
# │ 5 ┆ null ┆ 0.583206 ┆ false ┆ null ┆ 0.583206 │
# └──────┴──────────┴───────────┴───────┴───────┴──────────┘
# drop original A0 columns
df.select(
~cs.ends_with(suffix),
cs.ends_with(suffix).name.map(lambda s: s.removesuffix(suffix))
),
# shape: (5, 4)
# ┌──────┬───────┬───────┬──────────┐
# │ nrs ┆ A_A2 ┆ names ┆ random │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ bool ┆ str ┆ f64 │
# ╞══════╪═══════╪═══════╪══════════╡
# │ 1 ┆ true ┆ foo ┆ 0.713324 │
# │ 2 ┆ true ┆ ham ┆ 0.980031 │
# │ 3 ┆ false ┆ spam ┆ 0.242768 │
# │ null ┆ false ┆ egg ┆ 0.528783 │
# │ 5 ┆ false ┆ null ┆ 0.583206 │
# └──────┴───────┴───────┴──────────┘
sep='\n\n'
)
Alternatively you can use regular expressions to detect a range of suffix patterns
- polars.selectors.matches # find columns matching a pattern
- re.sub # substitute in string based on pattern
We will need to ensure our pattern ends with a '$'
to anchor the pattern
to the end of the string.
import polars as pl
from polars import selectors as cs
import numpy as np
import re
from functools import partial
df = pl.DataFrame(
{
"nrs": [1, 2, 3, None, 5],
"names_A0": ["foo", "ham", "spam", "egg", None],
"random_A0": np.random.rand(5),
"A_A2": [True, True, False, False, False],
}
)
digit=0
suffix = fr'_A{digit}$'
print(
# keep original A0 columns
df.with_columns(
cs.matches(suffix).name.map(lambda s: re.sub(suffix, '', s))
),
# shape: (5, 6)
# ┌──────┬──────────┬───────────┬───────┬───────┬──────────┐
# │ nrs ┆ names_A0 ┆ random_A0 ┆ A_A2 ┆ names ┆ random │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ str ┆ f64 ┆ bool ┆ str ┆ f64 │
# ╞══════╪══════════╪═══════════╪═══════╪═══════╪══════════╡
# │ 1 ┆ foo ┆ 0.713324 ┆ true ┆ foo ┆ 0.713324 │
# │ 2 ┆ ham ┆ 0.980031 ┆ true ┆ ham ┆ 0.980031 │
# │ 3 ┆ spam ┆ 0.242768 ┆ false ┆ spam ┆ 0.242768 │
# │ null ┆ egg ┆ 0.528783 ┆ false ┆ egg ┆ 0.528783 │
# │ 5 ┆ null ┆ 0.583206 ┆ false ┆ null ┆ 0.583206 │
# └──────┴──────────┴───────────┴───────┴───────┴──────────┘
# drop original A0 columns
df.select(
~cs.matches(suffix),
cs.matches(suffix).name.map(lambda s: re.sub(suffix, '', s))
),
# shape: (5, 4)
# ┌──────┬───────┬───────┬──────────┐
# │ nrs ┆ A_A2 ┆ names ┆ random │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ bool ┆ str ┆ f64 │
# ╞══════╪═══════╪═══════╪══════════╡
# │ 1 ┆ true ┆ foo ┆ 0.713324 │
# │ 2 ┆ true ┆ ham ┆ 0.980031 │
# │ 3 ┆ false ┆ spam ┆ 0.242768 │
# │ null ┆ false ┆ egg ┆ 0.528783 │
# │ 5 ┆ false ┆ null ┆ 0.583206 │
# └──────┴───────┴───────┴──────────┘
sep='\n\n'
)