Sample of display
| 2-Informa localizacao e CPF ┆ 229889 │
│ 1-Onboarding + Escolhe Segmento ┆ 383133 │
│ 6-Define metodo de pagamento ┆ 37520 │
│ 3-Escolhe plano ┆ 95487 │
│ 4-Realiza cadastro ┆ 46027 │
Sample for testing
df = pl.DataFrame({"Steps":["2-Informa localizacao e CPF","1-Onboarding + Escolhe Segmento","6-Define metodo de pagamento"],"UserIds":[229889,383133,37520]},schema_overrides={"Steps":pl.Categorical,"UserIds":pl.UInt32})
I have the following dataframe, in polars
Is there a easy way to sort that categorical data so the string that starts with the value one would be the first row,2 the second and so on is worth noting that the numerical columns are are not always in a perfect descending order.
It looks like you want to:
.struct[0]
int
for a numerical sortdf = pl.from_repr("""
┌─────────────────────────────────┬─────────┐
│ Steps ┆ UserIds │
│ --- ┆ --- │
│ cat ┆ i64 │
╞═════════════════════════════════╪═════════╡
│ 2-Informa localizacao e CPF ┆ 229889 │
│ 1-Onboarding + Escolhe Segmento ┆ 383133 │
│ 6-Define metodo de pagamento ┆ 37520 │
│ 10-Omg Hello ┆ 12345 │
│ 3-Escolhe plano ┆ 95487 │
│ 4-Realiza cadastro ┆ 46027 │
└─────────────────────────────────┴─────────┘
""")
df.sort(
pl.col("Steps").cast(str).str.splitn("-", 2).struct[0].cast(int)
)
shape: (6, 2)
┌─────────────────────────────────┬─────────┐
│ Steps ┆ UserIds │
│ --- ┆ --- │
│ cat ┆ i64 │
╞═════════════════════════════════╪═════════╡
│ 1-Onboarding + Escolhe Segmento ┆ 383133 │
│ 2-Informa localizacao e CPF ┆ 229889 │
│ 3-Escolhe plano ┆ 95487 │
│ 4-Realiza cadastro ┆ 46027 │
│ 6-Define metodo de pagamento ┆ 37520 │
│ 10-Omg Hello ┆ 12345 │
└─────────────────────────────────┴─────────┘