Search code examples
pythonpandasmergeconcatenation

Merge or concat df's with uneven rows - python


I have three separate data frames. I'm hoping to merge or concat these together. I have a reference value in each data frame. I have labelled them ValueX, ValueY, ValueZ. But they don't have unique values to merge on. They almost always contain the same number of corresponding values though.

Using the df below, I'm hoping to align the data frames on ValueX, ValueY, ValueZ. The values in each are largely aligned if you concat. but sometimes there are 4 or 6 values instead of 5. So the corresponding data frames aren't aligned.

import pandas as pd

df1 = pd.DataFrame({
    'ValueX' : [0,0,0,0,0,0,2,2,2,2,2,4,4,4,4,4],                   
    'Item3' : ['a', 'e', 'd', 'g', 'f', 'c', 'c', 'f', 'b', 'c', 'j', 'g', 'm', 's', 'c', 'k'], 
    'Item4' : ['f', 's', 'h', 'k', 'f', 'd', 'c', 'g', 'b', 'k', 'j', 'n', 'r', 'd', 'x', 'd'],                         
        })

df2 = pd.DataFrame({
    'ValueY' : [1,1,1,1,1,3,3,3,3,3,3,5,5,5,5,5,5],                   
    'Item1' : ['a', 'c', 'c', 'g', 'a', 'c', 'c', 'g', 'a', 'c', 'c', 'g', 'a', 'c', 'c', 'g',  'k'], 
    'Item2' : ['a', 'e', 'd', 'g', 'f', 'c', 'c', 'f', 'b', 'c', 'j', 'g', 'm', 's', 'c', 'k',  'k'],                         
        })

df3 = pd.DataFrame({
    'ValueZ' : [6,6,6,6,6,7,7,7,7,7,7,8,8,8,8,8],                  
    'Item5' : ['a', 'e', 'd', 'g', 'f', 'c', 'c', 'f', 'b', 'c', 'j', 'g', 'm', 's', 'c', 'k'], 
    'Item6' : ['f', 's', 'h', 'k', 'f', 'd', 'c', 'g', 'b', 'k', 'j', 'n', 'r', 'd', 'x', 'd'],                         
        })

final_df = pd.concat([df1, df2, df3], axis = 1)

Intended output:

    ValueX Item3 Item4  ValueY Item1 Item2  ValueZ Item5 Item6
0      0.0     a     f     1.0     a     a     6.0     a     f
1      0.0     e     s     1.0     c     e     6.0     e     s
2      0.0     d     h     1.0     c     d     6.0     d     h
3      0.0     g     k     1.0     g     g     6.0     g     k
4      0.0     f     f     1.0     a     f     6.0     f     f
5      0.0     c     d     NaN   NaN   NaN     NaN   NaN   NaN
6      2.0     c     c     3.0     c     c     7.0     c     d
7      2.0     f     g     3.0     c     c     7.0     c     c
8      2.0     b     b     3.0     g     f     7.0     f     g
9      2.0     c     k     3.0     a     b     7.0     b     b
10     2.0     j     j     3.0     c     c     7.0     c     k
11     NaN   NaN   NaN     3.0     c     j     7.0     j     j
12     4.0     g     n     5.0     g     g     8.0     g     n
13     4.0     m     r     5.0     a     m     8.0     m     r
14     4.0     s     d     5.0     c     s     8.0     s     d
15     4.0     c     x     5.0     c     c     8.0     c     x
16     4.0     k     d     5.0     g     k     8.0     k     d
17     NaN   NaN   NaN     5.0     k     k     NaN   NaN   NaN

Solution

  • You can use itertools.zip_longest to align the groups:

    from itertools import zip_longest
    
    g1 = df1.groupby('ValueX')
    g2 = df2.groupby('ValueY')
    g3 = df3.groupby('ValueZ')
    
    dfs = []
    for (_, a), (_, b), (_, c) in zip_longest(g1, g2, g3, fillvalue=('', pd.DataFrame())):
        dfs.append( 
            pd.concat([a.reset_index(drop=True), 
                       b.reset_index(drop=True), 
                       c.reset_index(drop=True)], axis=1) )
    
    final = pd.concat(dfs).reset_index(drop=True)
    print(final)
    

    Prints:

        ValueX Item3 Item4  ValueY Item1 Item2  ValueZ Item5 Item6
    0      0.0     a     f     1.0     a     a     6.0     a     f
    1      0.0     e     s     1.0     c     e     6.0     e     s
    2      0.0     d     h     1.0     c     d     6.0     d     h
    3      0.0     g     k     1.0     g     g     6.0     g     k
    4      0.0     f     f     1.0     a     f     6.0     f     f
    5      0.0     c     d     NaN   NaN   NaN     NaN   NaN   NaN
    6      2.0     c     c     3.0     c     c     7.0     c     d
    7      2.0     f     g     3.0     c     c     7.0     c     c
    8      2.0     b     b     3.0     g     f     7.0     f     g
    9      2.0     c     k     3.0     a     b     7.0     b     b
    10     2.0     j     j     3.0     c     c     7.0     c     k
    11     NaN   NaN   NaN     3.0     c     j     7.0     j     j
    12     4.0     g     n     5.0     g     g     8.0     g     n
    13     4.0     m     r     5.0     a     m     8.0     m     r
    14     4.0     s     d     5.0     c     s     8.0     s     d
    15     4.0     c     x     5.0     c     c     8.0     c     x
    16     4.0     k     d     5.0     g     k     8.0     k     d
    17     NaN   NaN   NaN     5.0     k     k     NaN   NaN   NaN