Search code examples
pythondataframepython-polars

how to stop otherwise( ) in python polars when using a when( ) expression?


This is my dataframe:

┌─────────────────────┬──────────┐
│ date                ┆ price    │
│ ---                 ┆ ---      │
│ datetime[μs]        ┆ f64      │
╞═════════════════════╪══════════╡
│ 2023-12-20 14:10:00 ┆ 2039.105 │
│ 2023-12-21 14:45:00 ┆ 2045.795 │
│ 2023-12-22 15:10:00 ┆ 2069.708 │
│ 2023-12-26 06:45:00 ┆ 2064.885 │
│ 2023-12-27 18:00:00 ┆ 2083.865 │
│ 2023-12-28 03:05:00 ┆ 2088.224 │
│ 2023-12-28 15:00:00 ┆ 2080.245 │
│ 2023-12-29 07:10:00 ┆ 2074.485 │
└─────────────────────┴──────────┘

My main issue was to find prices close to each other and group them together in Polars, but I didn't find any useful code. So, I decided to do it outside of polars.

Now I have this problem with Polars that I want to categorize price based on a nested list I have separately. I am using the below code:

for i ,group in enumerate(resistance_groups):
            highs = highs.with_columns(
                pl.when(pl.col('price').is_in(group))
                    .then(i+1)
                    .otherwise(None)
                    .alias('groups')
            )

Which resistance_groups is like this:

[[2064.885, 2069.708, 2074.485], [2080.245, 2083.865, 2088.224]]

And highs is the above dataframe.

The result of above code in first loop is :

┌─────────────────────┬──────────┬────────┐
│ date                ┆ price    ┆ groups │
│ ---                 ┆ ---      ┆ ---    │
│ datetime[μs]        ┆ f64      ┆ i32    │
╞═════════════════════╪══════════╪════════╡
│ 2023-12-20 14:10:00 ┆ 2039.105 ┆ null   │
│ 2023-12-21 14:45:00 ┆ 2045.795 ┆ null   │
│ 2023-12-22 15:10:00 ┆ 2069.708 ┆ 1      │
│ 2023-12-26 06:45:00 ┆ 2064.885 ┆ 1      │
│ 2023-12-27 18:00:00 ┆ 2083.865 ┆ null   │
│ 2023-12-28 03:05:00 ┆ 2088.224 ┆ null   │
│ 2023-12-28 15:00:00 ┆ 2080.245 ┆ null   │
│ 2023-12-29 07:10:00 ┆ 2074.485 ┆ 1      │
└─────────────────────┴──────────┴────────┘

And in second loop, it is:

┌─────────────────────┬──────────┬────────┐
│ date                ┆ price    ┆ groups │
│ ---                 ┆ ---      ┆ ---    │
│ datetime[μs]        ┆ f64      ┆ i32    │
╞═════════════════════╪══════════╪════════╡
│ 2023-12-20 14:10:00 ┆ 2039.105 ┆ null   │
│ 2023-12-21 14:45:00 ┆ 2045.795 ┆ null   │
│ 2023-12-22 15:10:00 ┆ 2069.708 ┆ null   │
│ 2023-12-26 06:45:00 ┆ 2064.885 ┆ null   │
│ 2023-12-27 18:00:00 ┆ 2083.865 ┆ 2      │
│ 2023-12-28 03:05:00 ┆ 2088.224 ┆ 2      │
│ 2023-12-28 15:00:00 ┆ 2080.245 ┆ 2      │
│ 2023-12-29 07:10:00 ┆ 2074.485 ┆ null   │
└─────────────────────┴──────────┴────────┘

As you see the first loop results are removed from df.

Can anyone suggest a way to either stop the .otherwise() or any other way to categorize the price column?

I tried to use multiple when-then expressions too and it didn't work too, using another column wasn't that good either.

And just in case: removing .otherwise( ) means setting the values to null.


Solution

  • Alternatively to @DeanMacGregor's solution, you could use pl.coalesce instead. This leverages that pl.when().then() evaluates to None if the when case is not True.

    (
        df
        .with_columns(
            pl.coalesce(
                pl.when(pl.col("price").is_in(group)).then(index+1)
                for index, group in enumerate(resistance_groups)
            )
            .alias("groups")
        ) 
    )
    
    shape: (8, 3)
    ┌─────────────────────┬──────────┬────────┐
    │ date                ┆ price    ┆ groups │
    │ ---                 ┆ ---      ┆ ---    │
    │ datetime[μs]        ┆ f64      ┆ i32    │
    ╞═════════════════════╪══════════╪════════╡
    │ 2023-12-20 14:10:00 ┆ 2039.105 ┆ null   │
    │ 2023-12-21 14:45:00 ┆ 2045.795 ┆ null   │
    │ 2023-12-22 15:10:00 ┆ 2069.708 ┆ 1      │
    │ 2023-12-26 06:45:00 ┆ 2064.885 ┆ 1      │
    │ 2023-12-27 18:00:00 ┆ 2083.865 ┆ 2      │
    │ 2023-12-28 03:05:00 ┆ 2088.224 ┆ 2      │
    │ 2023-12-28 15:00:00 ┆ 2080.245 ┆ 2      │
    │ 2023-12-29 07:10:00 ┆ 2074.485 ┆ 1      │
    └─────────────────────┴──────────┴────────┘