I have a list of pd.DataFrames all_predictions
that I want to concatinate. As I am getting errors, I began to investigate:
From two of the DataFrames in the list, I took a subset of rows and columns; and reset the index:
df1 = all_predictions[1]
df2 = all_predictions[2]
df1 = df1.head(3)
df2 = df2.head(3)
df1.reset_index(drop=True, inplace=True)
df2.reset_index(drop=True, inplace=True)
df1 = df1[["key_id", "prediction", "yrmon"]]
df2 = df2[["key_id", "prediction", "yrmon"]]
But I cannot concat them:
ValueError Traceback (most recent call last)
Cell In[78], line 1
----> 1 pd.concat([df1, df2], ignore_index=True)
File ~/projects/bcs-modeling/.venv/lib/python3.9/site-packages/pandas/core/reshape/concat.py:393, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
378 copy = False
380 op = _Concatenator(
381 objs,
382 axis=axis,
(...)
390 sort=sort,
391 )
--> 393 return op.get_result()
File ~/projects/bcs-modeling/.venv/lib/python3.9/site-packages/pandas/core/reshape/concat.py:680, in _Concatenator.get_result(self)
676 indexers[ax] = obj_labels.get_indexer(new_labels)
678 mgrs_indexers.append((obj._mgr, indexers))
--> 680 new_data = concatenate_managers(
681 mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy
682 )
683 if not self.copy and not using_copy_on_write():
684 new_data._consolidate_inplace()
File ~/projects/bcs-modeling/.venv/lib/python3.9/site-packages/pandas/core/internals/concat.py:199, in concatenate_managers(mgrs_indexers, axes, concat_axis, copy)
...
2116 if block_shape[0] == 0:
2117 raise ValueError("Empty data passed with indices specified.")
-> 2118 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
ValueError: Shape of passed values is (3, 3), indices imply (6, 3)
Then I recreated the DataFrames (I got the exact values from e.g. df1.to_dict()
):
d1 = pd.DataFrame({
"key_id": ["99c5a5fef58b0e89d22f6e2d99a7cdf5", "b2074747c5bbcaddfc588339e75542f4", "bd7798b548f115440054473557ec90f7"],
"prediction": [57.9340063165089, -82.29114989923285, -141.58455971583805],
"yrmon": ["2021-09-30","2021-09-30","2021-09-30"],
})
d2 = pd.DataFrame({
"key_id": ["0873065925bca4e1cd2b5ef42ca979fa", "8c55ac2774db7b6c20a4f1b0cf80b2b5", "cdaa56f8ff1863040a1593086c8e61c6"],
"prediction": [-298.70691, -24907.70776706384, 1.2290192287788002],
"yrmon": ["2021-09-30","2021-09-30","2021-09-30"],
})
And the concat went fine:
Why can I not concat df1 and df2? How to solve this?
Thanks a lot!
Edit: @Corralien:
ValueError Traceback (most recent call last)
Cell In[84], line 3
1 df1 = all_predictions[1].head(3).reset_index(drop=True)
2 df2 = all_predictions[2].head(3).reset_index(drop=True)
----> 3 pd.concat([df1, df2], ignore_index=True)
File ~/projects/bcs-modeling/.venv/lib/python3.9/site-packages/pandas/core/reshape/concat.py:393, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
378 copy = False
380 op = _Concatenator(
381 objs,
382 axis=axis,
(...)
390 sort=sort,
391 )
--> 393 return op.get_result()
File ~/projects/bcs-modeling/.venv/lib/python3.9/site-packages/pandas/core/reshape/concat.py:680, in _Concatenator.get_result(self)
676 indexers[ax] = obj_labels.get_indexer(new_labels)
678 mgrs_indexers.append((obj._mgr, indexers))
--> 680 new_data = concatenate_managers(
681 mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy
682 )
683 if not self.copy and not using_copy_on_write():
684 new_data._consolidate_inplace()
...
2116 if block_shape[0] == 0:
2117 raise ValueError("Empty data passed with indices specified.")
-> 2118 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
ValueError: Shape of passed values is (3, 195), indices imply (6, 195)
Given your examples, the error is not reproducible. However, supposing you're using pandas==2.1.0
and based on an open issue (see GH53640), it seems like the ValueError
is triggered due to a dtypes mismatch.
So, you can try this :
out = pd.concat([df1, df2.astype(df1.dtypes)], ignore_index=True)