Search code examples
pythonpython-polars

Sort by categorical data in polars, string that starts with numerical


Sample of display

| 2-Informa localizacao e CPF     ┆ 229889           │
│ 1-Onboarding + Escolhe Segmento ┆ 383133           │
│ 6-Define metodo de pagamento    ┆ 37520            │
│ 3-Escolhe plano                 ┆ 95487            │
│ 4-Realiza cadastro              ┆ 46027            │

Sample for testing

df = pl.DataFrame({"Steps":["2-Informa localizacao e CPF","1-Onboarding + Escolhe Segmento","6-Define metodo de pagamento"],"UserIds":[229889,383133,37520]},schema_overrides={"Steps":pl.Categorical,"UserIds":pl.UInt32})

I have the following dataframe, in polars

Is there a easy way to sort that categorical data so the string that starts with the value one would be the first row,2 the second and so on is worth noting that the numerical columns are are not always in a perfect descending order.


Solution

  • It looks like you want to:

    df = pl.from_repr("""
    ┌─────────────────────────────────┬─────────┐
    │ Steps                           ┆ UserIds │
    │ ---                             ┆ ---     │
    │ cat                             ┆ i64     │
    ╞═════════════════════════════════╪═════════╡
    │ 2-Informa localizacao e CPF     ┆ 229889  │
    │ 1-Onboarding + Escolhe Segmento ┆ 383133  │
    │ 6-Define metodo de pagamento    ┆ 37520   │
    │ 10-Omg Hello                    ┆ 12345   │
    │ 3-Escolhe plano                 ┆ 95487   │
    │ 4-Realiza cadastro              ┆ 46027   │
    └─────────────────────────────────┴─────────┘
    """)
    
    df.sort(
       pl.col("Steps").cast(str).str.splitn("-", 2).struct[0].cast(int)
    )
    
    shape: (6, 2)
    ┌─────────────────────────────────┬─────────┐
    │ Steps                           ┆ UserIds │
    │ ---                             ┆ ---     │
    │ cat                             ┆ i64     │
    ╞═════════════════════════════════╪═════════╡
    │ 1-Onboarding + Escolhe Segmento ┆ 383133  │
    │ 2-Informa localizacao e CPF     ┆ 229889  │
    │ 3-Escolhe plano                 ┆ 95487   │
    │ 4-Realiza cadastro              ┆ 46027   │
    │ 6-Define metodo de pagamento    ┆ 37520   │
    │ 10-Omg Hello                    ┆ 12345   │
    └─────────────────────────────────┴─────────┘