If I make up some time series data:
import pandas as pd
import numpy as np
from numpy.random import seed
# seed random number generator
seed(1)
time = pd.date_range('6/28/2021', periods=100, freq='1min')
df = pd.DataFrame(np.random.randint(100, size=100), index=time,columns=['data'])
df.plot(figsize=(25,8))
This will plot:
And then filter the data for when data
is above 50:
df = df.loc[df['data'] > 50]
How do I compute the time delta for when the data is above the value of 50? ONLY above the value of 50. For example if I do this:
# Compute delta time
df['time_delta'] = df.index.to_series().diff().astype('timedelta64[m]')
df.time_delta.sum()
I dont think the sum is correct as this will include a time delta for when the data was below the value of 50, hope fully that makes sense to ONLY retrieve a time delta for when the value was above 50.
IIUC, you want:
df["timedelta"] = df.index.to_series().diff().where(df["data"].gt(50))
>>> df["timedelta"].sum()
Timedelta('0 days 00:44:00')
Which should be correct because there are exactly 44 rows where "data" is above 50 and each of these corresponds to a 1 minute time difference:
>>> df["data"].gt(50).sum()
44