Search code examples
pythonpython-polars

Maintaining order in polars data frame after `partition_by`


Does polars.DataFrame.partition_by preserves the order of rows within each group?

I understand that group_by does, even when maintain_order=False. From documentation: Within each group, the order of rows is always preserved, regardless of this argument.

But nothing is mentioned for the partition_by operation. I guess this means the order is not guaranteed to be preserved, but looking for confirmation, since from a few tests I did the resulting dataframes (partitioned) always respected the original order.

Here is a the code I used for some toy experiment:

df = pl.DataFrame({ 
    "a" : np.arange(100000000), 
    "b": np.random.randint(0,50,100000000)
    }
)
all_dfs = df.partition_by("b", as_dict=True)
for key, df in all_dfs.items():
    assert df["a"].is_sorted()

Solution

  • partition_by is implemented by just doing a group_by and extracting the groups into separate DataFrames. I see no reason why we would change that, so I think it's safe to assume the order within each group is preserved, at least with the default arguments. I'll see if we can get the docs to match group_by's docs in that regard.