Search code examples
pythonpython-3.xpandasdataframepandasql

How to group by and count number of none missing values for each column in group in pandas


I have the following datadrame

user_id  var  qualified_date    loyal_date
  1       1      2017-01-17     2017-02-03
  2       1      2017-01-03     2017-01-13
  3       1      2017-01-11      NaT
  4       1       NaT            NaT
  5       1       NaT            NaT
  6       2      2017-01-15      2017-02-14
  7       2      2017-01-07      NaT
  8       2      2017-01-23      2017-02-18
  9       2      2017-01-25      NaT
  10      2      2017-01-11      2017-03-01

I need to groupby this dataframe by values in 'Var' and count the number of non missing values then for each of 'qualified_date' and 'engaged_date' columns. I can do it for each column seperately and put them manually in a dataframe, but i am looking for a groupby approach or soething similar where i can automatically come to a new DF than has values in 'var' as index and for for two columns that show count of non missing values for every group.

Like this

var  qualified_count loyal_count
 1       xx            xx
 2       xx            xx

Solution

  • You can use DF.GroupBy.count which includes only Non-NaN entries while counting. So, you can let var be the grouped key and then aggregate the counts respectively for the two selected columns of the DF as shown:

    cols = ['qualified_date', 'loyal_date']
    df.groupby('var')[cols].agg('count').add_suffix("_count").reset_index()
    

    enter image description here