Search code examples
pandasmelt

Pandas melt automatically sorts the results


I have a dataframe that looks like this:

col1    col2    col3    targ1    targ3    targ6
a        b      c        1        2       3
a        c      d        2        4       9

I am using melt to pivot the last three columns, and the result looks like this:

pivot_df = (data.melt(id_vars=data.columns.difference(['targ1','targ3','targ6']),var_name='m_val', value_name='value')
                .assign(month=lambda d: d['m_val'].astype('str').str.extractall('(\d+)').unstack().fillna('').sum(axis=1).astype(int))
            )

col1 col2 col3 m_val value
a     b    c    1     1
a     c    d    1     2
a     b    c    3     2
a     c    d    3     4
a     b    c    6     3
a     c    d    6     9 

Why is the melt sorts the m_val, I want the results to be like this:

col1 col2 col3 m_val value
a     b    c    1     1
a     c    d    3     2
a     b    c    6     3
a     c    d    1     2
a     b    c    3     4
a     c    d    6     9 

If I try to use ignore_index=False in melt() then I get this error:

ValueError: Index contains duplicate entries, cannot reshape

Solution

  • If use ignore_index=False then sort index values and set default index by DataFrame.sort_index:

    df = (data.melt(id_vars=data.columns.difference(['targ1','targ3','targ6']),var_name='m_val', value_name='value',ignore_index=False)
                    .sort_index(ignore_index=True)
                    .assign(month=lambda d: d['m_val'].astype('str').str.extractall('(\d+)').unstack().fillna('').sum(axis=1).astype(int))
                )
    print (df)
      col1 col2 col3  m_val  value  month
    0    a    b    c  targ1      1      1
    1    a    b    c  targ3      2      3
    2    a    b    c  targ6      3      6
    3    a    c    d  targ1      2      1
    4    a    c    d  targ3      4      3
    5    a    c    d  targ6      9      6
    

    Or use melt alternative with DataFrame.stack:

    df = (data.set_index(data.columns.difference(['targ1','targ3','targ6']).tolist())
              .rename_axis('m_val', axis=1)
              .stack()
              .reset_index(name='value')
              .assign(month=lambda d: d['m_val'].astype('str').str.extractall('(\d+)').unstack().fillna('').sum(axis=1).astype(int)))
    
    print (df)
      col1 col2 col3  m_val  value  month
    0    a    b    c  targ1      1      1
    1    a    b    c  targ3      2      3
    2    a    b    c  targ6      3      6
    3    a    c    d  targ1      2      1
    4    a    c    d  targ3      4      3
    5    a    c    d  targ6      9      6