I was doing a project in nlp. My input is:
index name lst
0 a c
0 d
0 e
1 f
1 b g
I need output like this:
index name lst combine
0 a c a c
0 d a d
0 e a e
1 f b f
1 b g b g
How can I achieve this?
You can use groupby
+transform('max')
to replace the empty cells with the letter per group as the letters have precedence over space. The rest is a simple string concatenation per column:
df['combine'] = df.groupby('index')['name'].transform('max') + ' ' + df['lst']
Used input:
df = pd.DataFrame({'index': [0,0,0,1,1],
'name': ['a','','','','b'],
'lst': list('cdefg'),
})
NB. I considered "index" to be a column here, if this is the index you should use df.index
in the groupby
Output:
index name lst combine
0 0 a c a c
1 0 d a d
2 0 e a e
3 1 f b f
4 1 b g b g