I have a dataframe and I would like to groupby and join the strings of a column together. So something like the below.
df = pd.DataFrame({
'id': [1, 1, 2, 2, 3, 3],
'txt': ['sth', 'sth else', 'sth', 'one more thing', 'sth else', 'sth else'],
'status': ['open', 'open', 'closed', 'open', 'open', 'open']})
df.assign(output=
df.where(df.status=='open')
.groupby(df.id)
.txt.transform(lambda col: ', '.join(col.fillna(''))))
which gives me this
id txt status output
0 1 sth open sth, sth else
1 1 sth else open sth, sth else
2 2 sth closed , one more thing
3 2 one more thing open , one more thing
4 3 sth else open sth else, sth else
5 3 sth else open sth else, sth else
is there a way to
id txt status output
0 1 sth open sth, sth else
1 1 sth else open sth, sth else
2 2 sth closed one more thing
3 2 one more thing open one more thing
4 3 sth else open sth else
5 3 sth else open sth else
Instead of fillna
, use dropna
and combine with drop_duplicates
:
df.assign(output=
df.where(df['status'].eq('open'))
.groupby(df['id'])['txt']
.transform(lambda col: ', '.join(col.dropna().drop_duplicates()))
)
Output:
id txt status output
0 1 sth open sth, sth else
1 1 sth else open sth, sth else
2 2 sth closed one more thing
3 2 one more thing open one more thing
4 3 sth else open sth else
5 3 sth else open sth else