Search code examples
pythonpandasconcatenation

Python pandas: Why is concat not working?


I have a list of pd.DataFrames all_predictions that I want to concatinate. As I am getting errors, I began to investigate:

From two of the DataFrames in the list, I took a subset of rows and columns; and reset the index:

df1 = all_predictions[1]
df2 = all_predictions[2]
df1 = df1.head(3)
df2 = df2.head(3)
df1.reset_index(drop=True, inplace=True)
df2.reset_index(drop=True, inplace=True)
df1 = df1[["key_id", "prediction", "yrmon"]]
df2 = df2[["key_id", "prediction", "yrmon"]]

enter image description here

But I cannot concat them:

ValueError                                Traceback (most recent call last)
Cell In[78], line 1
----> 1 pd.concat([df1, df2], ignore_index=True)

File ~/projects/bcs-modeling/.venv/lib/python3.9/site-packages/pandas/core/reshape/concat.py:393, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    378     copy = False
    380 op = _Concatenator(
    381     objs,
    382     axis=axis,
   (...)
    390     sort=sort,
    391 )
--> 393 return op.get_result()

File ~/projects/bcs-modeling/.venv/lib/python3.9/site-packages/pandas/core/reshape/concat.py:680, in _Concatenator.get_result(self)
    676             indexers[ax] = obj_labels.get_indexer(new_labels)
    678     mgrs_indexers.append((obj._mgr, indexers))
--> 680 new_data = concatenate_managers(
    681     mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy
    682 )
    683 if not self.copy and not using_copy_on_write():
    684     new_data._consolidate_inplace()

File ~/projects/bcs-modeling/.venv/lib/python3.9/site-packages/pandas/core/internals/concat.py:199, in concatenate_managers(mgrs_indexers, axes, concat_axis, copy)
...
   2116 if block_shape[0] == 0:
   2117     raise ValueError("Empty data passed with indices specified.")
-> 2118 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")

ValueError: Shape of passed values is (3, 3), indices imply (6, 3)

Then I recreated the DataFrames (I got the exact values from e.g. df1.to_dict()):

d1 = pd.DataFrame({
    "key_id": ["99c5a5fef58b0e89d22f6e2d99a7cdf5", "b2074747c5bbcaddfc588339e75542f4", "bd7798b548f115440054473557ec90f7"],
    "prediction": [57.9340063165089, -82.29114989923285, -141.58455971583805],
    "yrmon": ["2021-09-30","2021-09-30","2021-09-30"],
})
d2 = pd.DataFrame({
    "key_id": ["0873065925bca4e1cd2b5ef42ca979fa", "8c55ac2774db7b6c20a4f1b0cf80b2b5", "cdaa56f8ff1863040a1593086c8e61c6"],
    "prediction": [-298.70691, -24907.70776706384, 1.2290192287788002],
    "yrmon": ["2021-09-30","2021-09-30","2021-09-30"],
})

enter image description here

And the concat went fine:

enter image description here

Why can I not concat df1 and df2? How to solve this?

Thanks a lot!


Edit: @Corralien:

enter image description here

ValueError                                Traceback (most recent call last)
Cell In[84], line 3
      1 df1 = all_predictions[1].head(3).reset_index(drop=True)
      2 df2 = all_predictions[2].head(3).reset_index(drop=True)
----> 3 pd.concat([df1, df2], ignore_index=True)

File ~/projects/bcs-modeling/.venv/lib/python3.9/site-packages/pandas/core/reshape/concat.py:393, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    378     copy = False
    380 op = _Concatenator(
    381     objs,
    382     axis=axis,
   (...)
    390     sort=sort,
    391 )
--> 393 return op.get_result()

File ~/projects/bcs-modeling/.venv/lib/python3.9/site-packages/pandas/core/reshape/concat.py:680, in _Concatenator.get_result(self)
    676             indexers[ax] = obj_labels.get_indexer(new_labels)
    678     mgrs_indexers.append((obj._mgr, indexers))
--> 680 new_data = concatenate_managers(
    681     mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy
    682 )
    683 if not self.copy and not using_copy_on_write():
    684     new_data._consolidate_inplace()
...
   2116 if block_shape[0] == 0:
   2117     raise ValueError("Empty data passed with indices specified.")
-> 2118 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")

ValueError: Shape of passed values is (3, 195), indices imply (6, 195)

Solution

  • Given your examples, the error is not reproducible. However, supposing you're using pandas==2.1.0 and based on an open issue (see GH53640), it seems like the ValueError is triggered due to a dtypes mismatch.

    So, you can try this :

    out = pd.concat([df1, df2.astype(df1.dtypes)], ignore_index=True)