Search code examples
pythonpython-polars

How to use polars cut method returning result to original df


Update: pl.cut was removed from Polars. Expression equivalents were added instead:

.cut() .qcut()


How can I use it in select context, such as df.with_columns?

To be more specific, if I have a polars dataframe with a lot of columns and one of them is called x, how can I do pl.cut on x and append the grouping result into the original dataframe?

Below is what I tried but it does not work:

df = pl.DataFrame({"a": [1, 2, 3, 4, 5], "b": [2, 3, 4, 5, 6], "x": [1, 3, 5, 7, 9]})
df.with_columns(pl.cut(pl.col("x"), bins=[2, 4, 6]))

Thanks so much for your help.


Solution

  • From the docs, as of 2023-01-25, cut takes a Series and returns a DataFrame. Unlike many/most methods and functions, it doesn't take an expression so you can't use it in a select or with_column(s). To get your desired result you'd have to join it to your original df.

    Additionally, it appears that cut doesn't necessarily maintain the same dtypes as the parent series. (This is most certainly a bug) As such you have to cast it back to, in this case, int.

    You'd have:

    df=df.join(
        pl.cut(df.get_column('x'),bins=[2,4,6]).with_columns(pl.col('x').cast(pl.Int64())),
        on='x'
    )
    
    shape: (5, 5)
    ┌─────┬─────┬─────┬─────────────┬─────────────┐
    │ a   ┆ b   ┆ x   ┆ break_point ┆ category    │
    │ --- ┆ --- ┆ --- ┆ ---         ┆ ---         │
    │ i64 ┆ i64 ┆ i64 ┆ f64         ┆ cat         │
    ╞═════╪═════╪═════╪═════════════╪═════════════╡
    │ 1   ┆ 2   ┆ 1   ┆ 2.0         ┆ (-inf, 2.0] │
    │ 2   ┆ 3   ┆ 3   ┆ 4.0         ┆ (2.0, 4.0]  │
    │ 3   ┆ 4   ┆ 5   ┆ 6.0         ┆ (4.0, 6.0]  │
    │ 4   ┆ 5   ┆ 7   ┆ inf         ┆ (6.0, inf]  │
    │ 5   ┆ 6   ┆ 9   ┆ inf         ┆ (6.0, inf]  │
    └─────┴─────┴─────┴─────────────┴─────────────┘