Search code examples
pythonpandasgroup-byaggregate

Is there pandas aggregate function that combines features of 'any' and 'unique'?


I have a large dataset with similar data:

>>> df = pd.DataFrame(
...     {'A': ['one', 'two', 'two', 'one', 'one', 'three'],
...      'B': ['a', 'b', 'c', 'a', 'a', np.nan]})          
>>> df
       A    B
0    one    a
1    two    b
2    two    c
3    one    a
4    one    a
5  three  NaN

There are two aggregation functions 'any' and 'unique':

>>> df.groupby('A')['B'].any()
A
one       True
three    False
two       True
Name: B, dtype: bool

>>> df.groupby('A')['B'].unique()
A
one         [a]
three     [nan]
two      [b, c]
Name: B, dtype: object

but I want to get the folowing result (or something close to it):

A
one           a
three     False
two        True

I can do it with some complex code, but it is better for me to find appropriate function in python packages or the easiest way to solve problem. I'd be grateful if you could help me with that.


Solution

  • You can aggregate Series.nunique for first column and unique values with remove possible missing values for another columns:

    df1 = df.groupby('A').agg(count=('B','nunique'), 
                              uniq_without_NaNs = ('B', lambda x: x.dropna().unique()))
    print (df1)
           count uniq_without_NaNs
    A                             
    one        1               [a]
    three      0                []
    two        2            [b, c]
    

    Then create mask if greater column count by 1 and replace values by uniq_without_NaNs if equal count with 1:

    out = df1['count'].gt(1).mask(df1['count'].eq(1), df1['uniq_without_NaNs'].str[0])
    print (out)
    A
    one          a
    three    False
    two       True
    Name: count, dtype: object