Does polars.DataFrame.partition_by
preserves the order of rows within each group?
I understand that group_by
does, even when maintain_order=False
. From documentation:
Within each group, the order of rows is always preserved, regardless of this argument.
But nothing is mentioned for the partition_by
operation. I guess this means the order is not guaranteed to be preserved, but looking for confirmation, since from a few tests I did the resulting dataframes (partitioned) always respected the original order.
Here is a the code I used for some toy experiment:
df = pl.DataFrame({
"a" : np.arange(100000000),
"b": np.random.randint(0,50,100000000)
}
)
all_dfs = df.partition_by("b", as_dict=True)
for key, df in all_dfs.items():
assert df["a"].is_sorted()
partition_by
is implemented by just doing a group_by
and extracting the groups into separate DataFrame
s. I see no reason why we would change that, so I think it's safe to assume the order within each group is preserved, at least with the default arguments. I'll see if we can get the docs to match group_by
's docs in that regard.