Sort by categorical data in polars, string that starts with numerical

Sample of display

| 2-Informa localizacao e CPF     ┆ 229889           │
│ 1-Onboarding + Escolhe Segmento ┆ 383133           │
│ 6-Define metodo de pagamento    ┆ 37520            │
│ 3-Escolhe plano                 ┆ 95487            │
│ 4-Realiza cadastro              ┆ 46027            │

Sample for testing

df = pl.DataFrame({"Steps":["2-Informa localizacao e CPF","1-Onboarding + Escolhe Segmento","6-Define metodo de pagamento"],"UserIds":[229889,383133,37520]},schema_overrides={"Steps":pl.Categorical,"UserIds":pl.UInt32})

I have the following dataframe, in polars

Is there a easy way to sort that categorical data so the string that starts with the value one would be the first row,2 the second and so on is worth noting that the numerical columns are are not always in a perfect descending order.

Solution

It looks like you want to:

cast to string
split the string into 2 parts
extract the first part with indexing: .struct[0]
cast to int for a numerical sort

df = pl.from_repr("""
┌─────────────────────────────────┬─────────┐
│ Steps                           ┆ UserIds │
│ ---                             ┆ ---     │
│ cat                             ┆ i64     │
╞═════════════════════════════════╪═════════╡
│ 2-Informa localizacao e CPF     ┆ 229889  │
│ 1-Onboarding + Escolhe Segmento ┆ 383133  │
│ 6-Define metodo de pagamento    ┆ 37520   │
│ 10-Omg Hello                    ┆ 12345   │
│ 3-Escolhe plano                 ┆ 95487   │
│ 4-Realiza cadastro              ┆ 46027   │
└─────────────────────────────────┴─────────┘
""")

df.sort(
   pl.col("Steps").cast(str).str.splitn("-", 2).struct[0].cast(int)
)

shape: (6, 2)
┌─────────────────────────────────┬─────────┐
│ Steps                           ┆ UserIds │
│ ---                             ┆ ---     │
│ cat                             ┆ i64     │
╞═════════════════════════════════╪═════════╡
│ 1-Onboarding + Escolhe Segmento ┆ 383133  │
│ 2-Informa localizacao e CPF     ┆ 229889  │
│ 3-Escolhe plano                 ┆ 95487   │
│ 4-Realiza cadastro              ┆ 46027   │
│ 6-Define metodo de pagamento    ┆ 37520   │
│ 10-Omg Hello                    ┆ 12345   │
└─────────────────────────────────┴─────────┘