Search code examples
pythonpandasdataframemergeconcatenation

Combine data frame tables row by row


I have many (and long) data frames of same structure (same column names).

Example with three dataframes:

df1 = pd.DataFrame(np.array([['name1', 1, 2], ['name2', 3, 4], ['name3', 5, 6]]),
                   columns=['names', 'val1', 'val2'])

   names val1 val2
0  name1    1    2
1  name2    3    4
2  name3    5    6

df2 = pd.DataFrame(np.array([['name1', 11, 22], ['name2', 33, 44], ['name3', 55, 66]]),
                   columns=['names', 'val1', 'val2'])

   names val1 val2
0  name1   11   22
1  name2   33   44
2  name3   55   66

df3 = pd.DataFrame(np.array([['name1', 111, 222], ['name2', 333, 444], ['name3', 555, 666]]),
                   columns=['names', 'val1', 'val2'])

   names val1 val2
0  name1  111  222
1  name2  333  444
2  name3  555  666

How can I combine the dataframes row by row, so that the result is:

   names val1 val2
0  name1    1    2
1  name1   11   22
2  name1  111  222
3  name2    3    4
4  name2   33   44
5  name2  333  444
6  name3    5    6
7  name3   55   66
8  name3  555  666

I tried:

df=pd.concat([df1, df2, df3])
groups=df.groupby('names')'

but it seems to be wrong.


Solution

  • OP was not far from the desired goal. One can use pandas.concat to merge the dataframes, then sort_values to make sure that those with name1 appear first, and, finally, reset_index, as follows

    df_merge = pd.concat([df1, df2, df3], axis=0, ignore_index=True).sort_values('names').reset_index(drop=True)
    
    
    [Out]:
    
       names val1 val2
    0  name1    1    2
    1  name1   11   22
    2  name1  111  222
    3  name2    3    4
    4  name2   33   44
    5  name2  333  444
    6  name3    5    6
    7  name3   55   66
    8  name3  555  666