Unable to concatenate the polars dataframes with data types of a column being f64 and i64.
I have two pandas dataframes df1, df2 in pandas, where column 'a' in df1 is float and in df2 is int, when I perform pd.concat([df1, df2]) it works.
However, when I try the same operation on polars dataframes, it is throwing the following error:
exceptions.ShapeError: unable to vstack, dtypes for column "a" don't match: f64
and i64
pandas code:
import pandas as pd
df1 = pd.DataFrame({'a': [1.0, 2.0, 3.0], 'b': [1, 2, 3]})
df2 = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]})
pd.concat() produces the following output:
pd.concat([pd_df1, pd_df2])
a b
0 1.00000000 1
1 2.00000000 2
2 3.00000000 3
0 1.00000000 1
1 2.00000000 2
2 3.00000000 3
polars code:
import polars as pl
df1 = pl.DataFrame({'a': [1.0, 2.0, 3.0], 'b': [1, 2, 3]})
df2 = pl.DataFrame({'a': [1, 2, 3], 'b': [1, 2, 3]})
pl.concat() is producing the error, unlike pandas.
pl.concat([df1, df2])
Traceback (most recent call last):
File "C:\Users\user\.conda\envs\dev\lib\site-packages\IPython\core\interactiveshell.py", line 3362, in run_code
async def run_code(self, code_obj, result=None, *, async_=False):
File "<ipython-input-16-4301449ba376>", line 1, in <cell line: 1>
pl.concat([df1, df2])
File "C:\Users\user\.conda\envs\dev\lib\site-packages\polars\functions\eager.py", line 22, in concat
def concat(
exceptions.ShapeError: unable to vstack, dtypes for column "a" don't match: `f64` and `i64`
Here, I am fetching the data from database for various tables and creating a list of dataframes before concatenating them. Kindly help me with a solution where I could have the feasibility of not hard coding the column name in such scenarios.
You can use the vertical_relaxed
strategy.
pl.concat([df1, df2], how="vertical_relaxed")
shape: (6, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ f64 ┆ i64 │
╞═════╪═════╡
│ 1.0 ┆ 1 │
│ 2.0 ┆ 2 │
│ 3.0 ┆ 3 │
│ 1.0 ┆ 1 │
│ 2.0 ┆ 2 │
│ 3.0 ┆ 3 │
└─────┴─────┘