lazy-loading python-polars contextmanager

Stream larger than memory API results to file with Polars

How would I lazily stream results to file (eg. from an API) with Polars? The goal being to concat the results vertically without blowing up memory as the size of the file grows larger.

E.g. If I’m getting 50k results back at a time, I want to append the new results to the dataframe/parquet without loading the entire file into memory, ideally using pl.scan_parquet().

I’d likely create the initial file, get results, save to file, re-scan that file, concat it with the new results, save to file again, etc.

Solution

This was better/what I needed:

import csv

with open('data.csv', 'w', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['col1', 'col2'])
    for result in cur.fetchall(structure=pai.util.dict_of_lists):
        keys = sorted(result.keys()) #maintains consistent order
        writer.writerows(zip(*[result[key] for key in keys]))

pl.scan_csv('data.csv').fetch()