I have a column of strings where the end portion has some information I need to parse into its own columns. Pandas has the rsplit function to split a string from the right, which does exactly what I need:
import polars as pl
df = pl.DataFrame(
{
"name": [
"some_name_set_1",
"some_name_set_1b",
"some_other_name_set_2",
"yet_another_name_set_2",
]
}
)
df.to_pandas()["name"].str.rsplit("_", n=2, expand=True)
0 1 2
0 some_name set 1
1 some_name set 1b
2 some_other_name set 2
3 yet_another_name set 2
How can I mimic this behavior in Polars, which doesn't have an rsplit
expression right now?
A single char separator can also be expressed in terms of regex:
df.select(
pl.col("name").str.extract_groups(r"(.+)_([^_]+)_([^_]+)$")
.struct.unnest()
)
shape: (4, 3)
┌──────────────────┬─────┬─────┐
│ 1 ┆ 2 ┆ 3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════════════╪═════╪═════╡
│ some_name ┆ set ┆ 1 │
│ some_name ┆ set ┆ 1b │
│ some_other_name ┆ set ┆ 2 │
│ yet_another_name ┆ set ┆ 2 │
└──────────────────┴─────┴─────┘
But it is not a general solution (limited to single chars) and the regexes can be awkward to re-read.