I have a large dataset with similar data:
>>> df = pd.DataFrame(
... {'A': ['one', 'two', 'two', 'one', 'one', 'three'],
... 'B': ['a', 'b', 'c', 'a', 'a', np.nan]})
>>> df
A B
0 one a
1 two b
2 two c
3 one a
4 one a
5 three NaN
There are two aggregation functions 'any' and 'unique':
>>> df.groupby('A')['B'].any()
A
one True
three False
two True
Name: B, dtype: bool
>>> df.groupby('A')['B'].unique()
A
one [a]
three [nan]
two [b, c]
Name: B, dtype: object
but I want to get the folowing result (or something close to it):
A
one a
three False
two True
I can do it with some complex code, but it is better for me to find appropriate function in python packages or the easiest way to solve problem. I'd be grateful if you could help me with that.
You can aggregate Series.nunique
for first column and unique values with remove possible missing values for another columns:
df1 = df.groupby('A').agg(count=('B','nunique'),
uniq_without_NaNs = ('B', lambda x: x.dropna().unique()))
print (df1)
count uniq_without_NaNs
A
one 1 [a]
three 0 []
two 2 [b, c]
Then create mask if greater column count
by 1
and replace values by uniq_without_NaNs
if equal count
with 1
:
out = df1['count'].gt(1).mask(df1['count'].eq(1), df1['uniq_without_NaNs'].str[0])
print (out)
A
one a
three False
two True
Name: count, dtype: object