I have three separate data frames. I'm hoping to merge or concat these together. I have a reference value in each data frame. I have labelled them ValueX, ValueY, ValueZ
. But they don't have unique values to merge on. They almost always contain the same number of corresponding values though.
Using the df below, I'm hoping to align the data frames on ValueX, ValueY, ValueZ
. The values in each are largely aligned if you concat. but sometimes there are 4 or 6 values instead of 5. So the corresponding data frames aren't aligned.
import pandas as pd
df1 = pd.DataFrame({
'ValueX' : [0,0,0,0,0,0,2,2,2,2,2,4,4,4,4,4],
'Item3' : ['a', 'e', 'd', 'g', 'f', 'c', 'c', 'f', 'b', 'c', 'j', 'g', 'm', 's', 'c', 'k'],
'Item4' : ['f', 's', 'h', 'k', 'f', 'd', 'c', 'g', 'b', 'k', 'j', 'n', 'r', 'd', 'x', 'd'],
})
df2 = pd.DataFrame({
'ValueY' : [1,1,1,1,1,3,3,3,3,3,3,5,5,5,5,5,5],
'Item1' : ['a', 'c', 'c', 'g', 'a', 'c', 'c', 'g', 'a', 'c', 'c', 'g', 'a', 'c', 'c', 'g', 'k'],
'Item2' : ['a', 'e', 'd', 'g', 'f', 'c', 'c', 'f', 'b', 'c', 'j', 'g', 'm', 's', 'c', 'k', 'k'],
})
df3 = pd.DataFrame({
'ValueZ' : [6,6,6,6,6,7,7,7,7,7,7,8,8,8,8,8],
'Item5' : ['a', 'e', 'd', 'g', 'f', 'c', 'c', 'f', 'b', 'c', 'j', 'g', 'm', 's', 'c', 'k'],
'Item6' : ['f', 's', 'h', 'k', 'f', 'd', 'c', 'g', 'b', 'k', 'j', 'n', 'r', 'd', 'x', 'd'],
})
final_df = pd.concat([df1, df2, df3], axis = 1)
Intended output:
ValueX Item3 Item4 ValueY Item1 Item2 ValueZ Item5 Item6
0 0.0 a f 1.0 a a 6.0 a f
1 0.0 e s 1.0 c e 6.0 e s
2 0.0 d h 1.0 c d 6.0 d h
3 0.0 g k 1.0 g g 6.0 g k
4 0.0 f f 1.0 a f 6.0 f f
5 0.0 c d NaN NaN NaN NaN NaN NaN
6 2.0 c c 3.0 c c 7.0 c d
7 2.0 f g 3.0 c c 7.0 c c
8 2.0 b b 3.0 g f 7.0 f g
9 2.0 c k 3.0 a b 7.0 b b
10 2.0 j j 3.0 c c 7.0 c k
11 NaN NaN NaN 3.0 c j 7.0 j j
12 4.0 g n 5.0 g g 8.0 g n
13 4.0 m r 5.0 a m 8.0 m r
14 4.0 s d 5.0 c s 8.0 s d
15 4.0 c x 5.0 c c 8.0 c x
16 4.0 k d 5.0 g k 8.0 k d
17 NaN NaN NaN 5.0 k k NaN NaN NaN
You can use itertools.zip_longest
to align the groups:
from itertools import zip_longest
g1 = df1.groupby('ValueX')
g2 = df2.groupby('ValueY')
g3 = df3.groupby('ValueZ')
dfs = []
for (_, a), (_, b), (_, c) in zip_longest(g1, g2, g3, fillvalue=('', pd.DataFrame())):
dfs.append(
pd.concat([a.reset_index(drop=True),
b.reset_index(drop=True),
c.reset_index(drop=True)], axis=1) )
final = pd.concat(dfs).reset_index(drop=True)
print(final)
Prints:
ValueX Item3 Item4 ValueY Item1 Item2 ValueZ Item5 Item6
0 0.0 a f 1.0 a a 6.0 a f
1 0.0 e s 1.0 c e 6.0 e s
2 0.0 d h 1.0 c d 6.0 d h
3 0.0 g k 1.0 g g 6.0 g k
4 0.0 f f 1.0 a f 6.0 f f
5 0.0 c d NaN NaN NaN NaN NaN NaN
6 2.0 c c 3.0 c c 7.0 c d
7 2.0 f g 3.0 c c 7.0 c c
8 2.0 b b 3.0 g f 7.0 f g
9 2.0 c k 3.0 a b 7.0 b b
10 2.0 j j 3.0 c c 7.0 c k
11 NaN NaN NaN 3.0 c j 7.0 j j
12 4.0 g n 5.0 g g 8.0 g n
13 4.0 m r 5.0 a m 8.0 m r
14 4.0 s d 5.0 c s 8.0 s d
15 4.0 c x 5.0 c c 8.0 c x
16 4.0 k d 5.0 g k 8.0 k d
17 NaN NaN NaN 5.0 k k NaN NaN NaN