Search code examples
pythonpandasdataframegroup-by

How to use GROUP_CONCAT with HAVING clause in Pandas?


Suppose I have a dataframe like this:

name type
John 1
Joe 0
Joe 1
Marcus 0
John 2
John 0

How can I use Pandas to have the same result as:

SELECT name, GROUP_CONCAT(type)
FROM df
GROUP BY name
HAVING COUNT(*)>1;
name type
john 0,1,2
joe 0,1

Solution

  • You can use a custom groupby.agg with something like:

    out = (df
       .groupby('name', as_index=False, sort=False)['type']
       .agg(lambda s: ','.join(s.sort_values().astype(str)) if len(s)>1 else None)
       .dropna(subset='type')
    )
    

    Or:

    out = (df
       .groupby('name', as_index=False, sort=False)
       .agg(**{'type': ('type', lambda s: ','.join(s.sort_values().astype(str))),
               'count': ('type', 'count')
              })
       .loc[lambda d: d.pop('count').gt(1)]
    )
    

    Or:

    out = (df
       .groupby('name', as_index=False, sort=False)['type']
       .agg(lambda s: ','.join(s.sort_values().astype(str)))
       .loc[lambda d: d['type'].str.contains(',')]
    )
    

    Output:

       name   type
    0  John  0,1,2
    1   Joe    0,1