Search code examples
pythonpandasconcatenation

Prevent pandas concat'ting my dataframes both vertically and horizontally


I am trying to concat two dataframes, horizontally. df2 contains 2 result variables for every observation in df1.

df1.shape 
(242583, 172)
df2.shape
(242583, 2)

My code is:

Fin = pd.concat([df1, df2], axis= 1)

But somehow the result is stacked in 2 dimensions:

Fin.shape
(485166, 174)

What am I missing here?


Solution

  • There are different index values, so indexes are not aligned and get NaNs:

    df1 = pd.DataFrame({
        'A': ['a','a','a'],
        'B': range(3)
    })
    print (df1)
       A  B
    0  a  0
    1  a  1
    2  a  2
    
    df2 = pd.DataFrame({
        'C': ['b','b','b'],
        'D': range(4,7)
    }, index=[5,7,8])
    print (df2)
       C  D
    5  b  4
    7  b  5
    8  b  6
    

    Fin = pd.concat([df1, df2], axis= 1)
    print (Fin)
         A    B    C    D
    0    a  0.0  NaN  NaN
    1    a  1.0  NaN  NaN
    2    a  2.0  NaN  NaN
    5  NaN  NaN    b  4.0
    7  NaN  NaN    b  5.0
    8  NaN  NaN    b  6.0
    

    One possible solution is create default indexes:

    Fin = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis= 1)
    print (Fin)
       A  B  C  D
    0  a  0  b  4
    1  a  1  b  5
    2  a  2  b  6
    

    Or assign:

    df2.index = df1.index
    Fin = pd.concat([df1, df2], axis= 1)
    print (Fin)
       A  B  C  D
    0  a  0  b  4
    1  a  1  b  5
    2  a  2  b  6
    
    df1.index = df2.index
    Fin = pd.concat([df1, df2], axis= 1)
    print (Fin)
       A  B  C  D
    5  a  0  b  4
    7  a  1  b  5
    8  a  2  b  6