Python: Return First value based on dates for each product

I am looking for an iterative way of creating a subset of my pandas dataframe based on the date and product. I would like to keep the first row for each product within a 2 week window.

So for df A:

Date,Product,Return
1/1/2020,ABC,0.00993
1/2/2020,ABC,0.04231
1/4/2020,ABC,0.04231
1/30/2020,ABC,0.04231
2/20/2020,ABC,0.01408
6/15/2020,XYZ,0.04868
6/16/2020,XYZ,0.05284
6/19/2020,XYZ,0.05284
6/25/2020,XYZ,0.01578
8/25/2020,XYZ,0.03248
9/25/2020,XYZ,0.03248
10/12/2020,XYZ,0.0375
12/2/2020,XYZ,0.02589
6/11/2020,EFG,0.02589
7/13/2020,EFG,0.02589
7/17/2020,EFG,0.02859
7/21/2020,EFG,0.02084
7/27/2020,EFG,0.05154
7/29/2020,EFG,0.05154
9/8/2020,EFG,0.0616
9/14/2020,EFG,0.04092
9/18/2020,EFG,0.01578
9/22/2020,EFG,0.03248
6/9/2020,ASD,0.03248

I want DF B returned:

Date,Product,Return
1/1/2020,ABC,0.00993
1/30/2020,ABC,0.04231
2/20/2020,ABC,0.01408
6/15/2020,XYZ,0.04868
8/25/2020,XYZ,0.03248
9/25/2020,XYZ,0.03248
10/12/2020,XYZ,0.0375
12/2/2020,XYZ,0.02589
6/11/2020,EFG,0.02589
7/13/2020,EFG,0.02589
7/27/2020,EFG,0.05154
9/8/2020,EFG,0.0616
6/9/2020,ASD,0.03248

My total dataframe has 10k products, I tried using .loc to create a variable based on the datetime/time delta but it could be based on dates for prior products

Solution

You need some way to group them by the week difference. I suggest converting the date to week of year (52 week format), group by product, and get the diff() between each week of that product. Using this we can calculate which diffs are greater than one and use cumsum() to increment the group so they aren't together. The final columns 'c' is you extra grouping column. Group on product and c and use .head(1) to get the first value per group.

df = pd.DataFrame({'Date': ['1/1/2020','1/2/2020','1/4/2020','1/30/2020',
  '2/20/2020','6/15/2020','6/16/2020','6/19/2020','6/25/2020',
  '8/25/2020','9/25/2020','10/12/2020','12/2/2020','6/11/2020',
  '7/13/2020','7/17/2020','7/21/2020','7/27/2020', '7/29/2020',
  '9/8/2020','9/14/2020','9/18/2020','9/22/2020','6/9/2020'],
 'Product': ['ABC','ABC','ABC','ABC','ABC','XYZ','XYZ',
  'XYZ','XYZ','XYZ','XYZ','XYZ','XYZ','EFG','EFG','EFG',
  'EFG','EFG','EFG','EFG','EFG','EFG','EFG','ASD'],
 'Return': [0.00993,0.04231,0.04231,0.04231,0.01408,0.04868,
  0.05284,0.05284,0.015780000000000002,0.03248,0.03248,
  0.0375, 0.025889999999999996,0.025889999999999996,
  0.025889999999999996,0.028589999999999997,
  0.02084,0.051539999999999996,0.051539999999999996,
  0.0616,0.04092,0.015780000000000002,0.03248,0.03248]})

df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(by='Date').reset_index(drop=True)
df['week'] = df['Date'].dt.isocalendar().week
df['c'] = df.groupby('Product')['week'].diff().fillna(0).gt(1).astype(int).cumsum()

df = df.groupby(['Product','c']).head(1)

df.drop(columns=['week','c'], inplace=True)

Output

Date    Product Return
0   2020-01-01  ABC 0.00993
3   2020-01-30  ABC 0.04231
4   2020-02-20  ABC 0.01408
5   2020-06-09  ASD 0.03248
6   2020-06-11  EFG 0.02589
7   2020-06-15  XYZ 0.04868
11  2020-07-13  EFG 0.02589
16  2020-08-25  XYZ 0.03248
17  2020-09-08  EFG 0.06160
21  2020-09-25  XYZ 0.03248
22  2020-10-12  XYZ 0.03750
23  2020-12-02  XYZ 0.02589