I'm trying to convert a Pandas Dataframe to a Polar one.
I simply used the function result_polars = pl.from_pandas(result)
. Conversion proceeds well, but when I check the shape of the two dataframe I get that the Polars one has half the size of the original Pandas Dataframe.
I believe that 4172903059 in length is almost the maximum dimension that the polars dataframe allows.
Does anyone have suggestions?
Here a screenshot of the shape of the two dataframes.
Here a Minimum working example
import polars as pl
import pandas as pd
import numpy as np
df = pd.DataFrame(np.zeros((4292903069,1), dtype=np.uint8))
df_polars = pl.from_pandas(df)
Using these dimensions the two dataframes have the same size. If instead I put the following:
import polars as pl
import pandas as pd
import numpy as np
df = pd.DataFrame(np.zeros((4392903069,1), dtype=np.uint8))
df_polars = pl.from_pandas(df)
The Polars dataframe has much smaller dimension (97935773).
The default polars wheel retrieved with pip install polars
"only" allows for 2^32 e.g. ~4.2 billion rows.
Do you need more than that install pip install polars-u64-idx
and uninstall the previous installation.