Search code examples
pythonpandasdataframeconcatenation

Concat two dataframes with duplicated index that are in DateTime Format


I am trying to use pandas concat on two data frames that contain duplicate indices.

When I try to concat my two dataframes I get the following error

Shape of passed values is (12, 180054), indices imply (12, 10000).

To better understand the problem I created two dataframes:

df1 = pd.DataFrame([{'a':"2018-01-01",'b':2},{'a':"2018-01-01",'b':3},{'a':"2018-01-02",'b':4}],
                   columns = ['a','b']).set_index('a')
df1.index = pd.to_datetime(df1.index)

Which looks like:

            b
a   
2018-01-01  2
2018-01-01  3
2018-01-02  4

and

df2 = pd.DataFrame([{'a':"2018-01-01",'c':5},{'a':"2018-01-02",'c':6}],columns = ['a','c']).set_index('a')
df2.index = pd.to_datetime(df2.index)

Which looks like:

            c
a   
2018-01-01  5
2018-01-02  6

This has similar aspect too my original Dataframe. Indices are duplicated and are in datetime format.

However concat (with axis = 1) works just fine creating the following dataframe

            b   c
a       
2018-01-01  2   5
2018-01-01  3   5
2018-01-02  4   6

(which is what I would expect)

However if I use:

df3 = pd.DataFrame([{'a':"2018-01-01",'b':2},{'a':"2018-01-01",'b':3},{'a':"2018-01-03",'b':4}],
                   columns = ['a','b']).set_index('a')
df3.index = pd.to_datetime(df3.index)

Which looks like:

            b
a   
2018-01-01  2
2018-01-01  3
2018-01-03  4

instead of df1, it returns

Shape of passed values is (2, 6), indices imply (2, 4)

The only difference between the two is that df1's final date is 2018-01-02 and df3's is 2018-01-03.

Logically (for me at least) it should return the following:

            b   c
a       
2018-01-01  2   5
2018-01-01  3   5
2018-01-02  Nan 6
2018-01-03  4   Nan

I don't understand how it is able to correctly do one but not the other since if it can't handle duplicate indices it should fail equally on both.

Pandas concat: ValueError: Shape of passed values is blah, indices imply blah2 is bassically the same question however all the anaswers say that the issue is the duplicated indeices, however that cannot be the only reason since concat does actually work with duplicated indices.

I would like to actually understand what is going wrong and a way of getting round the issue.

Many thanks


Solution

  • ChuHo answered how to do it. I try to answer why it is not working: It should be this Bug.

    The Problem seems to occur, when there are duplicate rows and unique rows on both sides.

    import pandas as pd
    
    frame_a = pd.DataFrame({'a': ['a1']}, index = [1])
    frame_b = pd.DataFrame({'b': ['b1', 'b2', 'b2']}, index = [1,2,2])
    frame_c = pd.DataFrame({'c': ['c3', 'c3']}, index = [3,3])
    
    pd.concat([frame_a,frame_b], axis=1)  # works
         a   b
    1   a1  b1
    2  NaN  b2
    2  NaN  b2
    pd.concat([frame_a,frame_c], axis=1)  # fails
    ValueError: Shape of passed values is (5, 2), indices imply (3, 2)