I have the Following two MultiIndex dataframes with different shapes: Pandas dataframe 'a'
col0 = ['Set 1','Set 1','Set 1','Set 1','Set 2','Set 2','Set 2','Set 2','Set 2','Set 2']
col1 = ['paa','paa','jaa','paa','jaa','jaa','jaa','paa','paa','paa']
a = pd.DataFrame(data = np.random.randint(6, size=(3, 10)), columns = [col0,col1])
With output:
Set 1 Set 2
paa paa jaa paa jaa jaa jaa paa paa paa
0 3 0 2 1 2 0 3 5 4 3
1 2 1 2 1 0 5 5 5 3 4
2 5 2 1 2 5 1 5 5 0 2
And dataframe 'b'
col0 = ['Set 1','Set 1','Set 2','Set 2']
col1 = ['P1_1','P1_2','P2_1','P2_2']
b = pd.DataFrame(data = np.random.randint(3, size=(3, 4)), columns = [col0,col1])
with output:
Set 1 Set 2
P1_1 P1_2 P2_1 P2_2
0 2 1 1 2
1 0 0 2 2
2 0 0 1 0
Now I want to combine the two. Keeping the MultiIndex of pandas 'a, and the values of pandas'b'.
Desired output of pandas 'c':
Set 1 Set 2
P1_1 P1_2 P1_1 P1_2 P1_1 P1_2 P1_1 P1_2 P1_1 P1_2
0 2 1 2 1 1 2 1 2 1 2
1 0 0 0 0 2 2 2 2 2 2
2 0 0 0 0 1 0 1 0 1 0
Level_0 of pandas 'c' coincides with level_0 of pandas 'b'. Level_1 in 'c' alternates with the columns of pandas 'b'..
You might have to combine the following in some way:
temp=b.reindex(columns=map(lambda x:(x[0],'P1_1') ,a.columns))
a.groupby(level=0, axis=1)
Anything will help!
Idea is match levels a
and b
and repeat columns second level used for DataFrame.reindex
:
np.random.seed(123)
col0 = ['Set 1','Set 1','Set 1','Set 1','Set 2','Set 2','Set 2','Set 2','Set 2','Set 2']
col1 = ['paa','paa','jaa','paa','jaa','jaa','jaa','paa','paa','paa']
a = pd.DataFrame(data = np.random.randint(6, size=(3, 10)), columns = [col0,col1])
col0 = ['Set 1','Set 1','Set 2','Set 2']
col1 = ['P1_1','P1_2','P2_1','P2_2']
b = pd.DataFrame(data = np.random.randint(3, size=(3, 4)), columns = [col0,col1])
print (a)
Set 1 Set 2
paa paa jaa paa jaa jaa jaa paa paa paa
0 5 2 4 2 1 3 2 3 1 1
1 0 1 1 0 0 1 3 5 4 0
2 0 4 1 3 2 4 2 4 0 5
print (b)
Set 1 Set 2
P1_1 P1_2 P2_1 P2_2
0 0 1 0 0
1 0 2 1 1
2 2 2 2 1
#https://stackoverflow.com/a/3391105
def repeat_to_length(s, wanted):
return (s * (wanted//len(s) + 1))[:wanted]
out = []
for lvl in a.columns.levels[0]:
colsa = a.xs(lvl, axis=1, level=0).columns.tolist()
colsb = b.xs(lvl, axis=1, level=0).columns.tolist()
lvl1 = repeat_to_length(colsb, len(colsa))
out.extend(list(zip([lvl] * len(lvl1), lvl1)))
print (out)
[('Set 1', 'P1_1'), ('Set 1', 'P1_2'), ('Set 1', 'P1_1'),
('Set 1', 'P1_2'), ('Set 2', 'P2_1'), ('Set 2', 'P2_2'),
('Set 2', 'P2_1'), ('Set 2', 'P2_2'), ('Set 2', 'P2_1'), ('Set 2', 'P2_2')]
mux = pd.MultiIndex.from_tuples(out)
print (mux)
MultiIndex([('Set 1', 'P1_1'),
('Set 1', 'P1_2'),
('Set 1', 'P1_1'),
('Set 1', 'P1_2'),
('Set 2', 'P2_1'),
('Set 2', 'P2_2'),
('Set 2', 'P2_1'),
('Set 2', 'P2_2'),
('Set 2', 'P2_1'),
('Set 2', 'P2_2')],
)
c = b.reindex(mux, axis=1)
print (c)
Set 1 Set 2
P1_1 P1_2 P1_1 P1_2 P2_1 P2_2 P2_1 P2_2 P2_1 P2_2
0 0 1 0 1 0 0 0 0 0 0
1 0 2 0 2 1 1 1 1 1 1
2 2 2 2 2 2 1 2 1 2 1