Search code examples
pythonpandaspython-polarsrust-polars

Polars and Pandas DataFrame consume almost same memory. Where is the advantage of Polars?


I wanted to compare memory consumption for same dataset. I read same SQL query with pandas and polars from an Oracle DB. Memory usage results are almost same. and execution time is 2 times faster than polars. I expect polars will be more memory efficient.

Is there anyone who can explain this? And any suggestion to reduce memory usage size for same dataset?

Polars Read SQL: enter image description here

Pandas Read SQL: enter image description here

result(polars) and data(pandas) shapes:

enter image description here

and lastly memory usages:

enter image description here


Solution

  • One of the big advantages of Polars is query optimisation

    If you're loading all data into memory with read_database, and only doing that, then there will be no difference

    On the other hand, if you make the dataframe you read in lazy (DataFrame.lazy), then perform some other operations, and then collect the results (LazyFrame.collect), then that's where you'll see the Polars shine

    Note: usually you'll want to read the data in lazily directly (e.g. scan_parquet instead of read_parquet) but for read_database there is no scan_ equivalent