Search code examples
pythonpandasdatedatetimeisin

how to use date and value of one dataframe and search with this conditions in another dataframe


I want to search for a value (from one dataframe) in another dataframe in dependency of the date.

I have a Dataframe with DatetimeIndex based on a frequency of 1 minute. I resampled the Dataframe to a frequency of 5min and daily. This is the code and the output:

agg_dict = {'open': 'first','high': 'max','low': 'min','cls': 'last','vol': 'sum'}
data_5min = data_rth.resample('5min').agg(agg_dict).dropna().round(2).sort_index(ascending=False)
data_daily = data_rth.resample('D').agg(agg_dict).dropna().round(2).sort_index(ascending=False)
data_weekly= data_rth.resample('W').agg(agg_dict).dropna().round(2).sort_index(ascending=False)
data_monthly= data_rth.resample('M').agg(agg_dict).dropna().round(2).sort_index(ascending=False)

print('data_daily','\n',data_daily['high'].head())
print('data_5min','\n',data_5min['high'].head(24))

output:

data_daily 
 time
2021-08-05    441.85
2021-08-04    441.12
2021-08-03    441.28
2021-08-02    440.93
2021-07-30    440.06
Name: high, dtype: float64

data_5min 
 time
2021-08-05 16:00:00    441.85
2021-08-05 15:55:00    441.65
2021-08-05 15:50:00    441.39
2021-08-05 15:45:00    441.23
2021-08-05 15:40:00    441.24
2021-08-05 15:35:00    441.11
2021-08-05 15:30:00    440.90
2021-08-05 15:25:00    440.83
2021-08-05 15:20:00    440.78
2021-08-05 15:15:00    440.86
2021-08-05 15:10:00    440.94
2021-08-05 15:05:00    440.96
2021-08-05 15:00:00    440.89
2021-08-05 14:55:00    440.83
2021-08-05 14:50:00    440.87
2021-08-05 14:45:00    440.88
2021-08-05 14:40:00    440.96
2021-08-05 14:35:00    440.88
2021-08-05 14:30:00    440.86
2021-08-05 14:25:00    440.91
2021-08-05 14:20:00    440.96
2021-08-05 14:15:00    440.96
2021-08-05 14:10:00    440.98
2021-08-05 14:05:00    441.12
Name: high, dtype: float64

I want to look now where the high of each day shows in the 5min Frame. I tried

data_5min['high'].isin(data_daily['high'])

what gives me this output:

time
2021-08-05 16:00:00     True
2021-08-05 15:55:00    False
2021-08-05 15:50:00    False
2021-08-05 15:45:00    False
2021-08-05 15:40:00    False
2021-08-05 15:35:00    False
2021-08-05 15:30:00    False
2021-08-05 15:25:00    False
2021-08-05 15:20:00    False
2021-08-05 15:15:00    False
2021-08-05 15:10:00    False
2021-08-05 15:05:00    False
2021-08-05 15:00:00    False
2021-08-05 14:55:00    False
2021-08-05 14:50:00    False
2021-08-05 14:45:00    False
2021-08-05 14:40:00    False
2021-08-05 14:35:00    False
2021-08-05 14:30:00    False
2021-08-05 14:25:00    False
2021-08-05 14:20:00    False
2021-08-05 14:15:00    False
2021-08-05 14:10:00    False
2021-08-05 14:05:00     True

The True in the last line I dont want. It seems that this is the value at data_daily index 2021-08-04. What I want is to search every value from data_daily in the data_5min but depending on the dates. I tried

data_5min['high'].isin(data_daily['high']) & data_5min.index.isin(data_daily.index.date)

But I don't get it to work.

Any help would be nice.


Solution

  • You can use only data_5m to find the peak of each day using groupby and the .date part of DatetimeIndex:

    >>> data_5min.groupby(data_5min.index.date)['high'].idxmax()
    
    time
    2021-08-05   2021-08-05 16:00:00
    Freq: D, Name: high, dtype: datetime64[ns]