I have multiple named Pandas Series datasets with matching multilevel indexes
SeriesA =
L1 L2 L3 value_a1
L2 L3 value_a2
L3 value_a3
SeriesA.name = First_Name
SeriesB =
L1 L2 L3 Value_b1
L2 L3 Value_b2
L3 Value_b3
SeriesB.name = Second_Name
What I am trying to do is add the series name to the existing series index before concatenating them. The expected output should be
SeriesAB =
L1 L2 L3 First_name value_a1
L2 L3 First_name value_a2
L3 First_name value_a3
L1 L2 L3 Second_name value_b1
L2 L3 Second_name value_b2
L3 Second_name value_b3
I've tried various approaches with pd.join,pd.merge, pd.concat, but the name appears to be the sticking point.
The purpose of this to incorporate the name value before pivoting it into a dataframe so that the data looks like this. I'm guessing unstack would be the way to go with this.
final_data =
First_name Second_name
L1 L2 L3 value_a1 value_b1
L2 L3 value_a2 value_b2
L3 value_a3 value_b3
Also I want to avoid this output
Unwanted =
First_Name Second_Name
L1 L2 L3 value_a1 NaN
L2 L3 value_a2 NaN
L3 value_a3 NaN
L1 L2 L3 NaN value_b1
L2 L3 NaN value_b2
L3 NaN value_b3
Use concat
with Series.reorder_levels
and Series.unstack
.
There are duplicates in levels values after concat
, so solution is a bit complicated - is necessary helper levels with GroupBy.cumcount
:
df = (pd.concat([SeriesA, SeriesB], keys=('First_Name','Second_Name'))
.reorder_levels([1,2,3,0])
.to_frame('a'))
print (df)
a
L1 L2 L3 First_Name value_a1
First_Name value_a2
First_Name value_a3
Second_Name value_b1
Second_Name value_b2
Second_Name value_b3
df = (df.set_index(df.groupby(df.index).cumcount(), append=True)['a']
.unstack([3])
.reset_index(level=3, drop=True))
print (df)
First_Name Second_Name
L1 L2 L3 value_a1 value_b1
L3 value_a2 value_b2
L3 value_a3 value_b3
If not duplicates after concat
solution is simplier:
print (SeriesA)
L1 L2 L3 value_a1
L2 L2 L3 value_a2
L3 L2 L3 value_a3
Name: a, dtype: object
print (SeriesB)
L1 L2 L3 value_b1
L2 L2 L3 value_b2
L3 L2 L3 value_b3
Name: a, dtype: object
df = (pd.concat([SeriesA, SeriesB], keys=('First_Name','Second_Name'))
.reorder_levels([1,2,3,0])
.unstack())
print (df)
First_Name Second_Name
L1 L2 L3 value_a1 value_b1
L2 L2 L3 value_a2 value_b2
L3 L2 L3 value_a3 value_b3