How to find the maximum date value with conditions in python?

I have a three columns dataframe as follows. I want to calculate the returns in three months per day for every funds, so I need to get the date with recorded NAV data three months ago. Should I use the max() function with filter() function to deal this problem? If so, how? If not, could you please help me figure out a better way to do this?

fund code	date	NAV
fund 1	2021-01-04	1.0000
fund 1	2021-01-05	1.0001
fund 1	2021-01-06	1.0023
...	...	...
fund 2	2020-02-08	1.0000
fund 2	2020-02-09	0.9998
fund 2	2020-02-10	1.0001
...	...	...
fund 3	2022-05-04	2.0021
fund 3	2022-05-05	2.0044
fund 3	2022-05-06	2.0305

I tried to combined the max() function with filter() as follows:

max(filter(lambda x: x<=df['date']-timedelta(days=91))) But it didn't work.

Were this in excel, I know I could use the following functions to solve this problem: {max(if(B:B<=B2-91,B:B))}

{max(if(B:B<=B3-91,B:B))}

{max(if(B:B<=B4-91,B:B))}

....

But with python, I don't know what I could do. I just learnt it three days ago. Please help me.

This picture is what I want if it was in excel. The yellow area is the original data. The white part is the procedure I need for the calculation and the red part is the result I want. To get this result, I need to divide the 3rd column by the 5th column.

I know that I could use pct_change(period=7) function to get the same results in this picture. But here is the tricky part: the line 7 rows before is not necessarily the data 7 days before, and not all the funds are recorded daily. Some funds are recorded weekly, some monthly. So I need to check if the data used for division exists first.

Solution

what you need is an implementation of the maximum in sliding window (for your example 1 week, 7days).

I could recreated you example as follow (to create the data frame you have):

import pandas as pd
import datetime
from random import randint
df = pd.DataFrame(columns=["fund code", "date", "NAV"])
date = datetime.datetime.strptime("2021-01-04", '%Y-%m-%d')
for i in range(10):
    df = df.append({"fund code": 'fund 1', "date": date + datetime.timedelta(i), "NAV":randint(0,10)}, ignore_index=True)
    
for i in range(20, 25):
    df = df.append({"fund code": 'fund 1', "date": date + datetime.timedelta(i), "NAV":randint(0,10)}, ignore_index=True)
    
for i in range(20, 25):
    df = df.append({"fund code": 'fund 2', "date": date + datetime.timedelta(i), "NAV":randint(0,10)}, ignore_index=True)

this will look like your example, with not continuous dates and two different funds.

The maximum sliding window (for variable days length look like this)

import queue
class max_queue:
    def __init__(self, win=7):
        self.win = win
        self.queue = queue.deque()
        self.date = None
    def append(self, date, value):
        
        while self.queue and value > self.queue[-1][1]:
            self.queue.pop()
            
        while self.queue and date - self.queue[0][0] >= datetime.timedelta(self.win):
            self.queue.popleft()
            
        self.queue.append((date, value))
        self.date = date
        
    def get_max(self):
        return self.queue[0][1]

now you could simply iterate over rows and get the max value in the timeframe you are interested.

mq = max_queue(7)
pre_code = ''
for idx, row in df.iterrows():
    code, date, nav,*_ = row
    if code != pre_code:
        mq = max_queue(7)
    pre_code = code
    mq.append(date, nav)
    df.at[idx, 'max'] = mq.get_max()

results will look like this, with added max column. This assumes that funds data are continuous, but you could as well modify to have seperate max_queue for each funds as well.

using max queue to only keep track of the max in the window would be the correct complexity O(n) for a solution. important if you are dealing with huge datasets and especially bigger date ranges (instead of week).