Search code examples
pythonpandasfilteringaggregate-functions

Python filtering pandas


Question: Output the ID of the user and a total number of actions they performed for tasks they completed(action_name column="CompleteTask"). If a user from this company(ClassPass) did not complete any tasks in the given period of time, you should still output their ID and the number 0 in the second column.

dataset:

enter image description here

expected result:

enter image description here


Solution

  • Considering your initial dataframe is named df, you can try this :

    out = (df.groupby(['user_id'], as_index=False)
           .apply(lambda x: x[x['action_name'] == 'CompleteTask' ]['num_actions'].sum())
           .rename(columns={None: 'total_actions'})
          )
    

    >>> print(out)

    enter image description here