Search code examples
python-polars

How to right split n times in python polars dataframe (mimic pandas rsplit)


I have a column of strings where the end portion has some information I need to parse into its own columns. Pandas has the rsplit function to split a string from the right, which does exactly what I need:

import pandas as pd
pd.DataFrame(
    {
        "name": [
            "some_name_set_1",
            "some_name_set_1b",
            "some_other_name_set_2",
            "yet_another_name_set_2",
        ]
    }
)["name"].str.rsplit("_", n=2, expand=True)
                  0    1   2
0         some_name  set   1
1         some_name  set  1b
2   some_other_name  set   2
3  yet_another_name  set   2

How can I mimic this behavior in Polars, which doesn't have an rsplit expression right now?


Solution

  • A single char separator can be expressed in terms of regex:

    df.select(
       pl.col("name").str.extract_groups(r"(.+)_([^_]+)_([^_]+)$")
         .struct.field("*")
    )
    
    shape: (4, 3)
    ┌──────────────────┬─────┬─────┐
    │ 1                ┆ 2   ┆ 3   │
    │ ---              ┆ --- ┆ --- │
    │ str              ┆ str ┆ str │
    ╞══════════════════╪═════╪═════╡
    │ some_name        ┆ set ┆ 1   │
    │ some_name        ┆ set ┆ 1b  │
    │ some_other_name  ┆ set ┆ 2   │
    │ yet_another_name ┆ set ┆ 2   │
    └──────────────────┴─────┴─────┘
    

    But it's limited and somewhat awkward to grok/type.