I wanted to compare memory consumption for same dataset. I read same SQL query with pandas and polars from an Oracle DB. Memory usage results are almost same. and execution time is 2 times faster than polars. I expect polars will be more memory efficient.
Is there anyone who can explain this? And any suggestion to reduce memory usage size for same dataset?
result(polars) and data(pandas) shapes:
and lastly memory usages:
One of the big advantages of Polars is query optimisation
If you're loading all data into memory with
read_database, and only doing that, then there will be no difference
On the other hand, if you make the dataframe you read in lazy (
DataFrame.lazy), then perform some other operations, and then collect the results (
LazyFrame.collect), then that's where you'll see the Polars shine
Note: usually you'll want to read the data in lazily directly (e.g.
scan_parquet instead of
read_parquet) but for
read_database there is no