Search code examples
pythonpandasstackmelt

Converting dictionary to dataframe, and then melting / stacking columns to rows


I am a bit confused about how to use melt or stack correctly in order to transform my dataframe.

Currently have a dataframe which looks as follows:

enter image description here

My objective is to transform this data frame, (probably using melt or stack), and I'm having some trouble doing so. My objective is to have my dataframe look like this:

enter image description here

My data is currently stored in a dict :

{'K_T1': [84.51865079365093,
  84.51865079365093,
  83.80912698412698,
  83.80912698412698,
  86.56468253968255,
  86.56468253968255],
 'K_T2': [77.26309523809515,
  77.26309523809515,
  77.57777777777771,
  77.57777777777771,
  77.09999999999992,
  77.09999999999992],
 'Z_T1': [83.80912698412698,
  83.80912698412698,
  86.61111111111107,
  86.61111111111107,
  87.81746031746032,
  87.81746031746032],
 'Z_T2': [77.57777777777771,
  77.57777777777771,
  75.8619047619048,
  75.8619047619048,
  74.79007936507927,
  74.79007936507927]}

Pseudocode:

  • Convert my dict to a dataframe: K_Z_Averages_df = pd.DataFrame.from_dict(K_Z_Averages)
  • Add a column, for Part Name.
  • Identify if the initial column name has T1 or T2 listed and place it into the correct column, T1 or T2 using melt or stack.

Thanks for any support which can be offered!


Solution

  • Use str.split first by _ in columns, then reshape by DataFrame.stack with some data cleaning:

    df.columns = df.columns.str.split('_', expand=True)
    df = (df.stack(0)
            .sort_index(level=1)
            .rename_axis([None, 'Part Name'])
            .reset_index(level=1)
            .reset_index(drop=True))
    print (df)
       Part Name         T1         T2
    0          K  84.518651  77.263095
    1          K  84.518651  77.263095
    2          K  83.809127  77.577778
    3          K  83.809127  77.577778
    4          K  86.564683  77.100000
    5          K  86.564683  77.100000
    6          Z  83.809127  77.577778
    7          Z  83.809127  77.577778
    8          Z  86.611111  75.861905
    9          Z  86.611111  75.861905
    10         Z  87.817460  74.790079
    11         Z  87.817460  74.790079
    

    Another idea with wide_to_long and rename columns with change order before and after _:

    df.columns = [f'{b}_{a}' for a, b in df.columns.str.split('_')]
    
    df = (pd.wide_to_long(df.reset_index(),
                          stubnames=['T1','T2'],
                          i='index',
                          j='Part Name',
                          sep='_',
                          suffix='\w+')
           .reset_index(level=1).reset_index(drop=True))
    print (df)  
       Part Name         T1         T2
    0          K  84.518651  77.263095
    1          K  84.518651  77.263095
    2          K  83.809127  77.577778
    3          K  83.809127  77.577778
    4          K  86.564683  77.100000
    5          K  86.564683  77.100000
    6          Z  83.809127  77.577778
    7          Z  83.809127  77.577778
    8          Z  86.611111  75.861905
    9          Z  86.611111  75.861905
    10         Z  87.817460  74.790079
    11         Z  87.817460  74.790079