I have a dataframe that looks like this.
import pandas as pd
data = {'clean_words':['good','evening','how','are','you','how','can','i','help'],
'start_time':[1900,2100,2500,2750,2900,1500,1650,1770,1800],
'end_time':[2100,2500,2750,2900,3000,1650,1770,1800,1950],
'transaction':[1,1,1,1,1,2,2,2,2]}
df = pd.DataFrame(data)
df
If I try a basic melt, like so...
df_melted = df.pivot_table(index='clean_words', columns='transaction')
df_melted.tail()
I get this...
What I really want is the transaction number as columns and the words listed down. So, if transaction1 was the column, these words would be listed in rows, under that column:
`'good','evening','how','are','you'`
Under transaction2, these words would be listed in rows, under that column:
'how','can','i','help'
How can I do that? The start_time and end_time are kind of superfluous here.
import pandas as pd
import numpy as np
data = {'clean_words':['good','evening','how','are','you','how','can','i','help'],
'start_time':[1900,2100,2500,2750,2900,1500,1650,1770,1800],
'end_time':[2100,2500,2750,2900,3000,1650,1770,1800,1950],
'transaction':[1,1,1,1,1,2,2,2,2]}
df = pd.DataFrame(data)
df_melted = df.groupby('transaction')['clean_words'].apply(np.array).reset_index()
print(df_melted)
transaction clean_words
0 1 [good, evening, how, are, you]
1 2 [how, can, i, help]