polars equivalent of pandas set_index() to_dict

I have a polars dataframe:

import polars as pl
df = pl.DataFrame({'index': [1,2,3,2,1],
                   'object': [1, 1, 1, 2, 2],
                   'period': [1, 2, 4, 4, 23],
                   'value': [24, 67, 89, 5, 23]})

How do I do the following in polars that is easy enough in pandas:

In [2]: df.to_pandas().groupby("index").last().transpose().to_dict()
Out[2]: 
{1: {'object': 2, 'period': 23, 'value': 23},
 2: {'object': 2, 'period': 4, 'value': 5},
 3: {'object': 1, 'period': 4, 'value': 89}}

Solution

The Algorithm

Polars does not have the concept of an index. But we can reach the same result by using partition_by.

{
    index[0]: frame.select(pl.exclude('index')).to_dicts()[0]
    for index, frame in
        (
            df
            .unique(subset=['index'], keep='last')
            .partition_by(by=["index"],
                          as_dict=True,
                          maintain_order=True)
        ).items()
}

{1: {'object': 2, 'period': 23, 'value': 23},
2: {'object': 2, 'period': 4, 'value': 5},
3: {'object': 1, 'period': 4, 'value': 89}}

In steps

The heart of the algorithm is partition_by, with as_dict=True.

(
    df
    .unique(subset=['index'], keep='last')
    .partition_by(by=["index"],
                  as_dict=True,
                  maintain_order=True)
)

{(1,): shape: (1, 4)
┌───────┬────────┬────────┬───────┐
│ index ┆ object ┆ period ┆ value │
│ ---   ┆ ---    ┆ ---    ┆ ---   │
│ i64   ┆ i64    ┆ i64    ┆ i64   │
╞═══════╪════════╪════════╪═══════╡
│ 1     ┆ 2      ┆ 23     ┆ 23    │
└───────┴────────┴────────┴───────┘,
(2,): shape: (1, 4)
┌───────┬────────┬────────┬───────┐
│ index ┆ object ┆ period ┆ value │
│ ---   ┆ ---    ┆ ---    ┆ ---   │
│ i64   ┆ i64    ┆ i64    ┆ i64   │
╞═══════╪════════╪════════╪═══════╡
│ 2     ┆ 2      ┆ 4      ┆ 5     │
└───────┴────────┴────────┴───────┘,
(3,): shape: (1, 4)
┌───────┬────────┬────────┬───────┐
│ index ┆ object ┆ period ┆ value │
│ ---   ┆ ---    ┆ ---    ┆ ---   │
│ i64   ┆ i64    ┆ i64    ┆ i64   │
╞═══════╪════════╪════════╪═══════╡
│ 3     ┆ 1      ┆ 4      ┆ 89    │
└───────┴────────┴────────┴───────┘}

This creates a dictionary where the keys are the index values (tuple), and the values are the one-row sub-dataframes associated with each index.

Using these dictionaries, we can then construct our nested dictionaries using a Python dictionary comprehension as:

{
    index[0]: frame.to_dicts()
    for index, frame in
        (
            df
            .unique(subset=['index'], keep='last')
            .partition_by(by=["index"],
                          as_dict=True,
                          maintain_order=True)
        ).items()
}

{1: [{'index': 1, 'object': 2, 'period': 23, 'value': 23}],
2: [{'index': 2, 'object': 2, 'period': 4, 'value': 5}],
3: [{'index': 3, 'object': 1, 'period': 4, 'value': 89}]}

All that is left is tidying up the output so that index does not appear in the nested dictionaries, and getting rid of the unneeded list.

{
    index[0]: frame.select(pl.exclude('index')).to_dicts()[0]
    for index, frame in
        (
            df
            .unique(subset=['index'], keep='last')
            .partition_by(by=["index"],
                          as_dict=True,
                          maintain_order=True)
        ).items()
}

{1: {'object': 2, 'period': 23, 'value': 23},
2: {'object': 2, 'period': 4, 'value': 5},
3: {'object': 1, 'period': 4, 'value': 89}}