I have the following datadrame
user_id var qualified_date loyal_date
1 1 2017-01-17 2017-02-03
2 1 2017-01-03 2017-01-13
3 1 2017-01-11 NaT
4 1 NaT NaT
5 1 NaT NaT
6 2 2017-01-15 2017-02-14
7 2 2017-01-07 NaT
8 2 2017-01-23 2017-02-18
9 2 2017-01-25 NaT
10 2 2017-01-11 2017-03-01
I need to groupby this dataframe by values in 'Var' and count the number of non missing values then for each of 'qualified_date' and 'engaged_date' columns. I can do it for each column seperately and put them manually in a dataframe, but i am looking for a groupby approach or soething similar where i can automatically come to a new DF than has values in 'var' as index and for for two columns that show count of non missing values for every group.
Like this
var qualified_count loyal_count
1 xx xx
2 xx xx
You can use DF.GroupBy.count
which includes only Non-NaN
entries while counting. So, you can let var
be the grouped key and then aggregate the counts respectively for the two selected columns of the DF
as shown:
cols = ['qualified_date', 'loyal_date']
df.groupby('var')[cols].agg('count').add_suffix("_count").reset_index()