This is my dataframe:
┌─────────────────────┬──────────┐
│ date ┆ price │
│ --- ┆ --- │
│ datetime[μs] ┆ f64 │
╞═════════════════════╪══════════╡
│ 2023-12-20 14:10:00 ┆ 2039.105 │
│ 2023-12-21 14:45:00 ┆ 2045.795 │
│ 2023-12-22 15:10:00 ┆ 2069.708 │
│ 2023-12-26 06:45:00 ┆ 2064.885 │
│ 2023-12-27 18:00:00 ┆ 2083.865 │
│ 2023-12-28 03:05:00 ┆ 2088.224 │
│ 2023-12-28 15:00:00 ┆ 2080.245 │
│ 2023-12-29 07:10:00 ┆ 2074.485 │
└─────────────────────┴──────────┘
My main issue was to find prices close to each other and group them together in Polars, but I didn't find any useful code. So, I decided to do it outside of polars.
Now I have this problem with Polars that I want to categorize price based on a nested list I have separately. I am using the below code:
for i ,group in enumerate(resistance_groups):
highs = highs.with_columns(
pl.when(pl.col('price').is_in(group))
.then(i+1)
.otherwise(None)
.alias('groups')
)
Which resistance_groups is like this:
[[2064.885, 2069.708, 2074.485], [2080.245, 2083.865, 2088.224]]
And highs is the above dataframe.
The result of above code in first loop is :
┌─────────────────────┬──────────┬────────┐
│ date ┆ price ┆ groups │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ i32 │
╞═════════════════════╪══════════╪════════╡
│ 2023-12-20 14:10:00 ┆ 2039.105 ┆ null │
│ 2023-12-21 14:45:00 ┆ 2045.795 ┆ null │
│ 2023-12-22 15:10:00 ┆ 2069.708 ┆ 1 │
│ 2023-12-26 06:45:00 ┆ 2064.885 ┆ 1 │
│ 2023-12-27 18:00:00 ┆ 2083.865 ┆ null │
│ 2023-12-28 03:05:00 ┆ 2088.224 ┆ null │
│ 2023-12-28 15:00:00 ┆ 2080.245 ┆ null │
│ 2023-12-29 07:10:00 ┆ 2074.485 ┆ 1 │
└─────────────────────┴──────────┴────────┘
And in second loop, it is:
┌─────────────────────┬──────────┬────────┐
│ date ┆ price ┆ groups │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ i32 │
╞═════════════════════╪══════════╪════════╡
│ 2023-12-20 14:10:00 ┆ 2039.105 ┆ null │
│ 2023-12-21 14:45:00 ┆ 2045.795 ┆ null │
│ 2023-12-22 15:10:00 ┆ 2069.708 ┆ null │
│ 2023-12-26 06:45:00 ┆ 2064.885 ┆ null │
│ 2023-12-27 18:00:00 ┆ 2083.865 ┆ 2 │
│ 2023-12-28 03:05:00 ┆ 2088.224 ┆ 2 │
│ 2023-12-28 15:00:00 ┆ 2080.245 ┆ 2 │
│ 2023-12-29 07:10:00 ┆ 2074.485 ┆ null │
└─────────────────────┴──────────┴────────┘
As you see the first loop results are removed from df.
Can anyone suggest a way to either stop the .otherwise() or any other way to categorize the price column?
I tried to use multiple when-then expressions too and it didn't work too, using another column wasn't that good either.
And just in case: removing .otherwise( ) means setting the values to null.
Alternatively to @DeanMacGregor's solution, you could use pl.coalesce
instead. This leverages that pl.when().then()
evaluates to None
if the when
case is not True
.
(
df
.with_columns(
pl.coalesce(
pl.when(pl.col("price").is_in(group)).then(index+1)
for index, group in enumerate(resistance_groups)
)
.alias("groups")
)
)
shape: (8, 3)
┌─────────────────────┬──────────┬────────┐
│ date ┆ price ┆ groups │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ i32 │
╞═════════════════════╪══════════╪════════╡
│ 2023-12-20 14:10:00 ┆ 2039.105 ┆ null │
│ 2023-12-21 14:45:00 ┆ 2045.795 ┆ null │
│ 2023-12-22 15:10:00 ┆ 2069.708 ┆ 1 │
│ 2023-12-26 06:45:00 ┆ 2064.885 ┆ 1 │
│ 2023-12-27 18:00:00 ┆ 2083.865 ┆ 2 │
│ 2023-12-28 03:05:00 ┆ 2088.224 ┆ 2 │
│ 2023-12-28 15:00:00 ┆ 2080.245 ┆ 2 │
│ 2023-12-29 07:10:00 ┆ 2074.485 ┆ 1 │
└─────────────────────┴──────────┴────────┘