Search code examples
python-3.xpandasdataframepandas-groupbypython-datetime

Conditionals and techniques other than iteration for python 3 pandas' dataframes


I am continuing to work on my pandas skills and have run up against some difficult problems. This problem involves two dataframes df1 and df2. df1 contains event times and the corresponding details of each event. df2 contains a time period established by a start time and a stop time.

GOAL:

  1. group all events by time period.
  2. check if the count is increasing within the time period AND if all the codes are the same for the time period
  3. create a new column in df2 and return True to that column if BOTH pieces from part 2 are True and False if they are not or there are no events during the period.

code:

import pandas as pd
    
df1 = {'Event':  ['2020-12-01 00:10:22', '2020-12-01 00:15:11','2020-12-01 00:18:00',
                  '2020-12-01 00:31:00', '2020-12-01 00:54:00' , '2020-12-01 01:01:00' ,
                  '2020-12-01 01:19:00' , '2020-12-01 01:23:00' , '2020-12-01 01:24:00' ,
                  '2020-12-01 01:56:00' , '2020-12-01 21:02:00', '2020-12-01 02:41:00', 
                  '2020-12-01 02:44:00' , '2020-12-01 03:19:00' ,'2020-12-01 03:22:00' , 
                  '2020-12-01 03:49:00' , '2020-12-01 05:24:00' ,'2020-12-01 05:56:00' , 
                  '2020-12-01 08:02:00'
                   ] , 
       'Count' : [1 , 2 , 4 , 2 , 5 , 
                  3 , 7 , 9 , 10 , 1 , 
                  2 , 5 , 6 , 10 , 5 , 
                  6 ,7 , 8 , 3] ,
      'Code' : ['A' , 'A' , 'A' , 'A' , 'B' , 
                'B' , 'B' , 'B' , 'B' , 'B' , 
                'C' , 'C' , 'C' , 'C' , 'C' , 
                'D' , 'D' , 'D' , 'D']
        }

df1 = pd.DataFrame(df1 , columns = ['Event' , 'Count' , 'Code'])

df1['Event'] = pd.to_datetime(df1['Event'])

df

    Event   Count   Code
0   2020-12-01 00:10:22     1   A
1   2020-12-01 00:15:11     2   A
2   2020-12-01 00:18:00     4   A
3   2020-12-01 00:31:00     2   A
4   2020-12-01 00:54:00     5   B
5   2020-12-01 01:01:00     3   B
6   2020-12-01 01:19:00     7   B
7   2020-12-01 01:23:00     9   B
8   2020-12-01 01:24:00     10  B
9   2020-12-01 01:56:00     1   B
10  2020-12-01 21:02:00     2   C
11  2020-12-01 02:41:00     5   C
12  2020-12-01 02:44:00     6   C
13  2020-12-01 03:19:00     10  C
14  2020-12-01 03:22:00     5   C
15  2020-12-01 03:49:00     6   D
16  2020-12-01 05:24:00     7   D
17  2020-12-01 05:56:00     8   D
18  2020-12-01 08:02:00     3   D

creating df2 code :

df2 = {'Start Time' : ['2020-12-01 00:00:00', '2020-12-01 00:30:00','2020-12-01 01:30:00',
                    '2020-12-01 02:30:00', '2020-12-01 03:30:00' , '2020-12-01 04:30:00' ,
                    '2020-12-01 05:30:00' , '2020-12-01 07:30:00' , '2020-12-01 10:30:00' ,
                    '2020-12-01 15:00:00' , '2020-12-02 21:00:00'] ,
       'End Time' : ['2020-12-01 00:30:00', '2020-12-01 01:30:00','2020-12-01 02:30:00',
                    '2020-12-01 03:30:00', '2020-12-01 04:30:00' , '2020-12-01 05:30:00' ,
                    '2020-12-01 07:30:00' , '2020-12-01 10:30:00' , '2020-12-01 15:00:00' ,
                    '2020-12-01 21:00:00' , '2020-12-02 00:00:00']
       
        }

df2 = pd.DataFrame(df2 , columns = ['Start Time' , 'End Time'])

df2['Start Time'] = pd.to_datetime(df2['Start Time'])
df2['End Time'] = pd.to_datetime(df2['End Time'])

df2

    Start Time  End Time
0   2020-12-01 00:00:00     2020-12-01 00:30:00
1   2020-12-01 00:30:00     2020-12-01 01:30:00
2   2020-12-01 01:30:00     2020-12-01 02:30:00
3   2020-12-01 02:30:00     2020-12-01 03:30:00
4   2020-12-01 03:30:00     2020-12-01 04:30:00
5   2020-12-01 04:30:00     2020-12-01 05:30:00
6   2020-12-01 05:30:00     2020-12-01 07:30:00
7   2020-12-01 07:30:00     2020-12-01 10:30:00
8   2020-12-01 10:30:00     2020-12-01 15:00:00
9   2020-12-01 15:00:00     2020-12-01 21:00:00
10  2020-12-01 21:00:00     2020-12-02 00:00:00

Strategy:

My strategy is to use pd.Dataframe.between_time and then a lambda function for the conditional statement checks but I cannot seem to make it work.

Below is the desired output:

    Start Time  End Time    Test
0   2020-12-01 00:00:00     2020-12-01 00:30:00     True
1   2020-12-01 00:30:00     2020-12-01 01:30:00     False
2   2020-12-01 01:30:00     2020-12-01 02:30:00     True
3   2020-12-01 02:30:00     2020-12-01 03:30:00     False
4   2020-12-01 03:30:00     2020-12-01 04:30:00     True
5   2020-12-01 04:30:00     2020-12-01 05:30:00     True
6   2020-12-01 05:30:00     2020-12-01 07:30:00     True
7   2020-12-01 07:30:00     2020-12-01 10:30:00     True
8   2020-12-01 10:30:00     2020-12-01 15:00:00     False
9   2020-12-01 15:00:00     2020-12-01 21:00:00     False
10  2020-12-02 21:00:00     2020-12-02 00:00:00     False

Solution

  • You can use a custom function that yields a boolean value based on the specified conditions. Here we have used Series.is_monotonic_increasing to check if Count in increasing and Series.nunique to check if all codes are same for the given time period:

    def f():
        for x, y in df2.to_numpy():
            s = df1[df1['Event'].between(x, y)]
            yield s['Count'].is_monotonic_increasing & s['Code'].nunique() == 1
    
    df2['Test'] = list(f())
    

    Result:

                Start Time            End Time   Test
    0  2020-12-01 00:00:00 2020-12-01 00:30:00   True
    1  2020-12-01 00:30:00 2020-12-01 01:30:00  False
    2  2020-12-01 01:30:00 2020-12-01 02:30:00   True
    3  2020-12-01 02:30:00 2020-12-01 03:30:00  False
    4  2020-12-01 03:30:00 2020-12-01 04:30:00   True
    5  2020-12-01 04:30:00 2020-12-01 05:30:00   True
    6  2020-12-01 05:30:00 2020-12-01 07:30:00   True
    7  2020-12-01 07:30:00 2020-12-01 10:30:00   True
    8  2020-12-01 10:30:00 2020-12-01 15:00:00  False
    9  2020-12-01 15:00:00 2020-12-01 21:00:00  False
    10 2020-12-02 21:00:00 2020-12-02 00:00:00  False