Search code examples
pandasdataframedata-conversionpython-polarsrust-polars

Dataframe conversion from pandas to polars -- difference in the final dimensions


I'm trying to convert a Pandas Dataframe to a Polar one.

I simply used the function result_polars = pl.from_pandas(result). Conversion proceeds well, but when I check the shape of the two dataframe I get that the Polars one has half the size of the original Pandas Dataframe.

I believe that 4172903059 in length is almost the maximum dimension that the polars dataframe allows.

Does anyone have suggestions?

Here a screenshot of the shape of the two dataframes.

Here a Minimum working example

import polars as pl
import pandas as pd
import numpy as np

df = pd.DataFrame(np.zeros((4292903069,1), dtype=np.uint8))
df_polars = pl.from_pandas(df)

Using these dimensions the two dataframes have the same size. If instead I put the following:

import polars as pl
import pandas as pd
import numpy as np

df = pd.DataFrame(np.zeros((4392903069,1), dtype=np.uint8))
df_polars = pl.from_pandas(df)

The Polars dataframe has much smaller dimension (97935773).


Solution

  • The default polars wheel retrieved with pip install polars "only" allows for 2^32 e.g. ~4.2 billion rows.

    Do you need more than that install pip install polars-u64-idx and uninstall the previous installation.