pandas lambda group-by transform calculated-columns

Creating a new column based on group by and logical filtering

I have a DataFrame with columns 'a' and 'b'. I want to create a new column which is the result of this SQL statement.

df['c'] = Select 'a' , count (case when 'b' ==1 then 'a' else null end) from df group by 'a'

How can I achieve the same thing using Pandas?

df = pd.DataFrame({'a':['a','a','b','a','b'], 'b' : [1,0,0,1,1]})
df =    a   b
0       a   1
1       a   0
2       b   0
3       a   1
4       b   1

I tried below

df['c'] = df.groupby('a').apply(lambda x : x[x['b']==1]['a'].count()).reset_index().ix[:,-1]

which generated these incorrect results:

    a   b   c
0   a   1   2
1   a   0   0
2   b   0   NaN
3   a   1   NaN
4   b   1   NaN

While I was expecting [2,NAN,NaN,2,1]

Appreciate all the help and guidance in advance!

Solution

I would create a dummy column to do this:

In [11]: df["c"] = df["b"] == 1

In [12]: df["c"]
Out[12]:
0     True
1    False
2    False
3     True
4    False
Name: c, dtype: bool

In [13]: df.groupby("a")["c"].sum()
Out[13]:
a
a    2
b    0
Name: c, dtype: float64

This avoids the apply so will be more efficient.

Note: checking equality and summing the booleans is the same as counting the number of entries which are equal.

If you want this new c column to propogate over df, you can use transform:

In [14]: df.groupby("a")["c"].transform("sum")
Out[14]:
0    2
1    2
2    0
3    2
4    0
dtype: float64