How do I get corresponding values after group_by and agg

I am using polars with python and have this sample dataset:

df = pl.from_repr("""
┌─────────────────────┬────────────┬────────┐
│ Timestamps          ┆ Date       ┆ Values │
│ ---                 ┆ ---        ┆ ---    │
│ datetime[μs]        ┆ date       ┆ i64    │
╞═════════════════════╪════════════╪════════╡
│ 2022-10-01 04:00:00 ┆ 2022-10-01 ┆ 1      │
│ 2022-10-01 06:00:00 ┆ 2022-10-01 ┆ 2      │
│ 2022-10-01 08:30:00 ┆ 2022-10-01 ┆ 2      │
│ 2022-12-02 09:00:00 ┆ 2022-12-02 ┆ 5      │
│ 2022-12-02 09:15:00 ┆ 2022-12-02 ┆ 4      │
│ 2022-12-02 09:30:00 ┆ 2022-12-02 ┆ 3      │
│ 2022-12-02 09:45:00 ┆ 2022-12-02 ┆ 2      │
│ 2022-12-02 10:00:00 ┆ 2022-12-02 ┆ 1      │
│ 2022-12-02 10:15:00 ┆ 2022-12-02 ┆ 3      │
└─────────────────────┴────────────┴────────┘
""")

I would like to get the max and min value for every single Date together with the corresponding timestamp. The result for f.e the max value should look like this:

┌────────────┬────────┬─────────────────────┐
│ Date       ┆ Values ┆ Timestamps          │
│ ---        ┆ ---    ┆ ---                 │
│ date       ┆ i64    ┆ datetime[μs]        │
╞════════════╪════════╪═════════════════════╡
│ 2022-12-02 ┆ 5      ┆ 2022-12-02 09:00:00 │
│ 2022-10-01 ┆ 2      ┆ 2022-10-01 06:00:00 │
└────────────┴────────┴─────────────────────┘

Currently I've got this line but I can't figure out how to extract a value (timestamp) out of a column next to the aggregated value:

df.group_by('Date').agg(pl.max('Values', 'Timestamps')).sort("Date", descending=True)

Solution

Let's say we start with...

dataset = pl.DataFrame(
    {
        "Timestamps": ["2022-10-01 06:00:00", "2022-10-01 08:30:00", "2022-12-02 09:00:00", "2022-12-02 09:15:00", "2022-12-02 09:30:00", "2022-12-02 09:45:00", "2022-12-02 10:00:00", "2022-12-02 10:15:00"],
        "Date": ["2022-10-01", "2022-10-01", "2022-12-02", "2022-12-02", "2022-12-02", "2022-12-02", "2022-12-02", "2022-12-02"],
        "Values": [2, 2, 1, 4, 3, 2, 1, 3],
    }
)
qa = dataset.lazy().with_columns(pl.col("Timestamps").str.to_datetime(), pl.col("Date").str.to_date())
df1 = qa.collect()


shape: (8, 3)
┌─────────────────────┬────────────┬────────┐
│ Timestamps          ┆ Date       ┆ Values │
│ ---                 ┆ ---        ┆ ---    │
│ datetime[μs]        ┆ date       ┆ i64    │
╞═════════════════════╪════════════╪════════╡
│ 2022-10-01 06:00:00 ┆ 2022-10-01 ┆ 2      │
│ 2022-10-01 08:30:00 ┆ 2022-10-01 ┆ 2      │
│ 2022-12-02 09:00:00 ┆ 2022-12-02 ┆ 1      │
│ 2022-12-02 09:15:00 ┆ 2022-12-02 ┆ 4      │
│ 2022-12-02 09:30:00 ┆ 2022-12-02 ┆ 3      │
│ 2022-12-02 09:45:00 ┆ 2022-12-02 ┆ 2      │
│ 2022-12-02 10:00:00 ┆ 2022-12-02 ┆ 1      │
│ 2022-12-02 10:15:00 ┆ 2022-12-02 ┆ 3      │
└─────────────────────┴────────────┴────────┘

In this case we see that on 10-01 there's a tie for Values and on 12-02 the highest Values corresponds to 9:15.

You could try two approaches that start the same...

(
    df1
        .group_by('Date')
        .agg(pl.col('Values').max())
        .join(df1, on=['Date','Values'])
        .sort('Timestamps')
        .unique(subset=['Date','Values'])
        .sort("Date", descending=True)
)

(
    df1
        .group_by('Date')
        .agg(pl.col('Values').max())
        .join(df1, on=['Date','Values'])
        .group_by(['Date','Values'])
        .agg(pl.col('Timestamps').min())
        .sort("Date", descending=True)

)

I'm not which is more efficient between the sort/unique or the second groupby/agg but in either case you'd get:

shape: (2, 3)
┌────────────┬────────┬─────────────────────┐
│ Date       ┆ Values ┆ Timestamps          │
│ ---        ┆ ---    ┆ ---                 │
│ date       ┆ i64    ┆ datetime[μs]        │
╞════════════╪════════╪═════════════════════╡
│ 2022-10-01 ┆ 2      ┆ 2022-10-01 06:00:00 │
│ 2022-12-02 ┆ 4      ┆ 2022-12-02 09:15:00 │
└────────────┴────────┴─────────────────────┘