Search code examples
lazy-loadingpython-polarscontextmanager

Stream larger than memory API results to file with Polars


How would I lazily stream results to file (eg. from an API) with Polars? The goal being to concat the results vertically without blowing up memory as the size of the file grows larger.

E.g. If I’m getting 50k results back at a time, I want to append the new results to the dataframe/parquet without loading the entire file into memory, ideally using pl.scan_parquet().

I’d likely create the initial file, get results, save to file, re-scan that file, concat it with the new results, save to file again, etc.


Solution

  • This was better/what I needed:

    import csv
    
    with open('data.csv', 'w', encoding='utf-8') as file:
        writer = csv.writer(file)
        writer.writerow(['col1', 'col2'])
        for result in cur.fetchall(structure=pai.util.dict_of_lists):
            keys = sorted(result.keys()) #maintains consistent order
            writer.writerows(zip(*[result[key] for key in keys]))
    
    pl.scan_csv('data.csv').fetch()