Search code examples
pythonpandasdataframeconcatenationseries

Concatenate Pandas series and add series name to multilevel index


I have multiple named Pandas Series datasets with matching multilevel indexes

SeriesA = 

L1 L2 L3    value_a1
   L2 L3    value_a2
      L3    value_a3

SeriesA.name = First_Name


SeriesB =

L1 L2 L3    Value_b1
   L2 L3    Value_b2
      L3    Value_b3

SeriesB.name = Second_Name

What I am trying to do is add the series name to the existing series index before concatenating them. The expected output should be

SeriesAB =

L1  L2  L3 First_name    value_a1
    L2  L3 First_name    value_a2
        L3 First_name    value_a3
L1  L2  L3 Second_name   value_b1
    L2  L3 Second_name   value_b2
        L3 Second_name   value_b3

I've tried various approaches with pd.join,pd.merge, pd.concat, but the name appears to be the sticking point.

The purpose of this to incorporate the name value before pivoting it into a dataframe so that the data looks like this. I'm guessing unstack would be the way to go with this.

final_data =
            First_name  Second_name
L1  L2  L3  value_a1    value_b1
    L2  L3  value_a2    value_b2
        L3  value_a3    value_b3

Also I want to avoid this output

Unwanted = 
               First_Name   Second_Name
L1   L2   L3   value_a1     NaN
     L2   L3   value_a2     NaN
          L3   value_a3     NaN
L1   L2   L3   NaN          value_b1
     L2   L3   NaN          value_b2
          L3   NaN          value_b3 


Solution

  • Use concat with Series.reorder_levels and Series.unstack.

    There are duplicates in levels values after concat, so solution is a bit complicated - is necessary helper levels with GroupBy.cumcount:

    df = (pd.concat([SeriesA, SeriesB], keys=('First_Name','Second_Name'))
            .reorder_levels([1,2,3,0])
            .to_frame('a'))
    print (df)
                                 a
    L1 L2 L3 First_Name   value_a1
             First_Name   value_a2
             First_Name   value_a3
             Second_Name  value_b1
             Second_Name  value_b2
             Second_Name  value_b3
    
    df = (df.set_index(df.groupby(df.index).cumcount(), append=True)['a']
            .unstack([3])
            .reset_index(level=3, drop=True))
    print (df)
             First_Name Second_Name
    L1 L2 L3   value_a1    value_b1
          L3   value_a2    value_b2
          L3   value_a3    value_b3
    

    If not duplicates after concat solution is simplier:

    print (SeriesA)
    L1  L2  L3    value_a1
    L2  L2  L3    value_a2
    L3  L2  L3    value_a3
    Name: a, dtype: object
    
    print (SeriesB)
    L1  L2  L3    value_b1
    L2  L2  L3    value_b2
    L3  L2  L3    value_b3
    Name: a, dtype: object
    
    df = (pd.concat([SeriesA, SeriesB], keys=('First_Name','Second_Name'))
            .reorder_levels([1,2,3,0])
            .unstack())
    print (df)
             First_Name Second_Name
    L1 L2 L3   value_a1    value_b1
    L2 L2 L3   value_a2    value_b2
    L3 L2 L3   value_a3    value_b3