Search code examples
pythonchunkspython-polars

Python Polars: how to join with lazyframes?


I am trying to join two big parquets using lazyframe, but it's not possible due their size (polars collapses).

Then I was thinking about using iter_slices, but it' doesn't work with lazyframes. What would be a good solution for this? What's the usefulness of iter_slices if must be a Dataframe?

Thanks.


Solution

  • If your dataset doesn't fit into memory, you can try adding a filter or a limit behind the join node and materialize via collect(streaming=True). This will try to execute the query out of core.

    If the result dataset doesn't fit into memory, you can stream it directly to disk using sink_parquet or sink_ipc.