python pandas pandas-apply pandas-resample

Pandas resample and aggregate with condition

I have a DataFrame which have a column with statuses like this:

datetime               |    session    |    try       |    status
2020-09-17 10:00:01    |    '1a'       |    '1a_1'    |    'success'
2020-09-17 10:00:02    |    '2a'       |    '2a_1'    |    'fail'
2020-09-17 10:00:03    |    '2a'       |    '2a_2'    |    'success'
2020-09-17 10:00:03    |    '3a'       |    '3a_1'    |    'interrupted'
2020-09-18 13:00:04    |    '4a'       |    '4a_1'    |    'fail'

I want to resample data by day with counting status types by condition in sessions (not tries).

I can resample it by tries easyly like this:

df['date'] = df['datetime'].dt.date
df['ones'] = np.ones(df.shape[0])
piv = df.pivot_table(index='date', columns='status', values='ones', aggfunc=len).fillna(0)

And have:

day           |    success    |    fail    |    interrupted
2020-09-17    |    2          |    2       |    1
2020-09-18    |    0          |    1       |    0

But I want to aggregate it by session with condition no matter how much tries in session.:

if 'success' status in session tries, then success +1, fail +0, interrupted +0;
if 'interrupted' status in session AND NO 'success' status in session, then success +0, fail +0, interrupted +1;
if NO 'interrupted' AND 'success' statuses in session, then then success +0, fail +1, interrupted +0.

So I shoul get something like this:

day           |    success    |    fail    |    interrupted
2020-09-17    |    2          |    0       |    1
2020-09-18    |    0          |    1       |    0

I stuck with function and all I come up with ends with "ValueError: The truth value of a Series is ambiguous". I will be very greatfull for any ideas.

Solution

My idea is convert values of statust to ordered categories, sorting and get row only with most important value which is defined in list passed to parameter categories:

print (df)
             datetime session   try       status
0 2020-09-17 10:00:01      1a  1a_1      success
1 2020-09-17 10:00:02      2a  2a_1         fail
2 2020-09-17 10:00:03      2a  2a_2      success
3 2020-09-17 10:00:03      3a  3a_1  interrupted
4 2020-09-18 13:00:04      4a  4a_1         fail
5 2020-09-19 10:00:01      1a  1a_1  interrupted
6 2020-09-19 10:00:02      1a  2a_1         fail
7 2020-09-19 10:00:03      2a  2a_2      success
8 2020-09-19 10:00:03      2a  3a_1  interrupted

df['status'] = pd.Categorical(df['status'], 
                              ordered=True, 
                              categories=['success','interrupted','fail'])
df['date'] = df['datetime'].dt.date

df1 = df.sort_values(['date','status']).drop_duplicates(['date','session'])
print (df1)
             datetime session   try       status        date
0 2020-09-17 10:00:01      1a  1a_1      success  2020-09-17
2 2020-09-17 10:00:03      2a  2a_2      success  2020-09-17
3 2020-09-17 10:00:03      3a  3a_1  interrupted  2020-09-17
4 2020-09-18 13:00:04      4a  4a_1         fail  2020-09-18
7 2020-09-19 10:00:03      2a  2a_2      success  2020-09-19
5 2020-09-19 10:00:01      1a  1a_1  interrupted  2020-09-19

piv = pd.crosstab(df1['date'], df1['status'])
print (piv)
status      success  interrupted  fail
date                                  
2020-09-17        2            1     0
2020-09-18        0            0     1
2020-09-19        1            1     0