I am trying to join two big parquets using lazyframe, but it's not possible due their size (polars collapses).
Then I was thinking about using iter_slices, but it' doesn't work with lazyframes. What would be a good solution for this? What's the usefulness of iter_slices if must be a Dataframe?
Thanks.
If your dataset doesn't fit into memory, you can try adding a filter
or a limit
behind the join
node and materialize via collect(streaming=True)
. This will try to execute the query out of core.
If the result dataset doesn't fit into memory, you can stream it directly to disk using sink_parquet
or sink_ipc
.