Suppose I have a dataframe like this:
name | type |
---|---|
John | 1 |
Joe | 0 |
Joe | 1 |
Marcus | 0 |
John | 2 |
John | 0 |
How can I use Pandas to have the same result as:
SELECT name, GROUP_CONCAT(type)
FROM df
GROUP BY name
HAVING COUNT(*)>1;
name | type |
---|---|
john | 0,1,2 |
joe | 0,1 |
You can use a custom groupby.agg
with something like:
out = (df
.groupby('name', as_index=False, sort=False)['type']
.agg(lambda s: ','.join(s.sort_values().astype(str)) if len(s)>1 else None)
.dropna(subset='type')
)
Or:
out = (df
.groupby('name', as_index=False, sort=False)
.agg(**{'type': ('type', lambda s: ','.join(s.sort_values().astype(str))),
'count': ('type', 'count')
})
.loc[lambda d: d.pop('count').gt(1)]
)
Or:
out = (df
.groupby('name', as_index=False, sort=False)['type']
.agg(lambda s: ','.join(s.sort_values().astype(str)))
.loc[lambda d: d['type'].str.contains(',')]
)
Output:
name type
0 John 0,1,2
1 Joe 0,1