I have a Pandas dataframe of power enable/disable commands vs time. The index is not in-use at the moment. You can create it on your own with this:
pd.DataFrame(
{'command_timestamp': {
0: pd.Timestamp('2023-08-01 15:39:42'),
1: pd.Timestamp('2023-08-02 03:30:39'),
2: pd.Timestamp('2023-08-02 16:09:35'),
4: pd.Timestamp('2023-08-02 17:30:16'),
5: pd.Timestamp('2023-08-02 17:32:05'),
6: pd.Timestamp('2023-08-02 17:45:43'),
7: pd.Timestamp('2023-08-03 17:48:01'),
8: pd.Timestamp('2023-08-03 18:20:11'),
9: pd.Timestamp('2023-08-04 18:49:37'),
10: pd.Timestamp('2023-08-07 21:13:05')},
'command': {
0: 'enable',
1: 'disable',
2: 'enable',
4: 'enable',
5: 'enable',
6: 'disable',
7: 'enable',
8: 'disable',
9: 'enable',
10: 'disable'}})
I need to calculate the "ON time" of the device by day. Imagine that the actual data set is much larger than this example set. I cannot seem to come up with good solution, much less one that doesn't involve iterating through the dataframe and checking a load of if
statements.
Some assumptions and things to consider:
For a little context, this is for representing "daily utilization" of the device rather than counting "consecutive on-time" etc..
Here is a manual calculation of the example dataset:
results = {
'2023-08-01': (
pd.Timestamp('2023-08-02 00:00:00') -
pd.Timestamp('2023-08-01 15:39:42')),
'2023-08-02': (
pd.Timestamp('2023-08-02 03:30:39') -
pd.Timestamp('2023-08-02 00:00:00')
) + (
pd.Timestamp('2023-08-02 17:45:43') -
pd.Timestamp('2023-08-02 16:09:35')
),
'2023-08-03': (
pd.Timestamp('2023-08-03 18:20:11') -
pd.Timestamp('2023-08-03 17:48:01')
),
'2023-08-04': (
pd.Timestamp('2023-08-05 00:00:00') -
pd.Timestamp('2023-08-04 18:49:37')
),
'2023-08-05': (
pd.Timestamp('2023-08-06 00:00:00') -
pd.Timestamp('2023-08-05 00:00:00')
),
'2023-08-06': (
pd.Timestamp('2023-08-07 00:00:00') -
pd.Timestamp('2023-08-06 00:00:00')
),
'2023-08-07': (
pd.Timestamp('2023-08-07 21:13:05') -
pd.Timestamp('2023-08-07 00:00:00')
)
}
Okay, I now think I understand your question. First, create fictitious entries to mark your day boundaries.
df = pd.concat([
df.set_index('command_timestamp'),
df.reindex(pd.date_range(
start=df['command_timestamp'].min().date(),
end=df['command_timestamp'].max().date(),
freq='D'))
]).sort_index()
(I assume start is disable
.) Then assume that the last state carries forward to the fictitious day boundary. Then do the differencing. You must pass to_series
to access the right function.
df['command'] = df['command'].ffill().fillna('disable')
df['diff'] = df.index.to_series().diff().shift(-1)
A group by the date along with a filtered summation then yields your desired outcome:
>>> df.groupby(df.index.to_series().dt.date) \
... .apply(lambda d: d.loc[d['command'] == 'enable',
... 'diff'].sum())
2023-08-01 0 days 08:20:18
2023-08-02 0 days 05:06:47
2023-08-03 0 days 00:32:10
2023-08-04 0 days 05:10:23
2023-08-05 1 days 00:00:00
2023-08-06 1 days 00:00:00
2023-08-07 0 days 21:13:05
dtype: timedelta64[ns]
Explanatory note. Prior to the group by summation, the relevant columns of the data frame look like this:
command diff
2023-08-01 00:00:00 disable 0 days 15:39:42
2023-08-01 15:39:42 enable 0 days 08:20:18
2023-08-02 00:00:00 enable 0 days 03:30:39
2023-08-02 03:30:39 disable 0 days 12:38:56
2023-08-02 16:09:35 enable 0 days 01:20:41
2023-08-02 17:30:16 enable 0 days 00:01:49
2023-08-02 17:32:05 enable 0 days 00:13:38
2023-08-02 17:45:43 disable 0 days 06:14:17
2023-08-03 00:00:00 disable 0 days 17:48:01
2023-08-03 17:48:01 enable 0 days 00:32:10
2023-08-03 18:20:11 disable 0 days 05:39:49
2023-08-04 00:00:00 disable 0 days 18:49:37
2023-08-04 18:49:37 enable 0 days 05:10:23
2023-08-05 00:00:00 enable 1 days 00:00:00
2023-08-06 00:00:00 enable 1 days 00:00:00
2023-08-07 00:00:00 enable 0 days 21:13:05
2023-08-07 21:13:05 disable NaT
It may be more didactic to read diff
instead as time_elapsed
.