Search code examples
pythonpython-3.xdataframepandas-melt

How can we melt a dataframe and list words under columns?


I have a dataframe that looks like this.

import pandas as pd

data = {'clean_words':['good','evening','how','are','you','how','can','i','help'],
        'start_time':[1900,2100,2500,2750,2900,1500,1650,1770,1800],
        'end_time':[2100,2500,2750,2900,3000,1650,1770,1800,1950],
        'transaction':[1,1,1,1,1,2,2,2,2]}

df = pd.DataFrame(data)
df

enter image description here

If I try a basic melt, like so...

df_melted = df.pivot_table(index='clean_words', columns='transaction')
df_melted.tail()

I get this...

enter image description here

What I really want is the transaction number as columns and the words listed down. So, if transaction1 was the column, these words would be listed in rows, under that column:

`'good','evening','how','are','you'`

Under transaction2, these words would be listed in rows, under that column:

'how','can','i','help'

How can I do that? The start_time and end_time are kind of superfluous here.


Solution

  • import pandas as pd
    import numpy as np
    
    data = {'clean_words':['good','evening','how','are','you','how','can','i','help'],
            'start_time':[1900,2100,2500,2750,2900,1500,1650,1770,1800],
            'end_time':[2100,2500,2750,2900,3000,1650,1770,1800,1950],
            'transaction':[1,1,1,1,1,2,2,2,2]}
    
    df = pd.DataFrame(data)
    
    df_melted = df.groupby('transaction')['clean_words'].apply(np.array).reset_index()
    
    print(df_melted)
    
    transaction                     clean_words
    0            1  [good, evening, how, are, you]
    1            2             [how, can, i, help]