Select IDs with an increment of 7 days

I am very new to python and pandas and am working on a pandas data frame which is a 3 month data and looks like:

Date     ID   product
Jul-1     1     A
Jul-1     2     B
Jul-1     3     D
Jul-4     2     F
Jul-5     1     G
Jul-9     1     K
Jul-9     2     L
Jul-9     3     X
Jul-11    1     N
Jul-13    3     M
Jul-17    1     Z
Jul-17    2     O
Jul-17    3     T

What I am trying to do is to select the ID's only at the increment of 7 days, such that it does not select any ID if it is present for the next seven days from the last date it was picked. The resulting data frame would look like this

Date     ID   product
Jul-1     1     A
Jul-1     2     B
Jul-1     3     D
Jul-9     1     K
Jul-9     2     L
Jul-9     3     X
Jul-17    1     Z
Jul-17    2     O
Jul-17    3     T

Have been trying since morning and any help would be highly appreciated

Thanks a lot in advance!!

Solution

Use the data from @suicidalteddy, with minor edits

import numpy as np
import panas as pd

dat = pd.DataFrame([{'Date': '07/01/2016', 'ID': 1, 'product':'A'},
                {'Date': '07/01/2016', 'ID': 2, 'product':'B'},
                {'Date': '07/01/2016', 'ID': 3, 'product':'D'},
                {'Date': '07/04/2016', 'ID': 2, 'product':'F'},
                {'Date': '07/05/2016', 'ID': 1, 'product':'G'},
                {'Date': '07/09/2016', 'ID': 1, 'product':'K'},
                {'Date': '07/09/2016', 'ID': 2, 'product':'L'},
                {'Date': '07/09/2016', 'ID': 3, 'product':'X'},
                {'Date': '07/11/2016', 'ID': 1, 'product':'N'},
                {'Date': '07/13/2016', 'ID': 3, 'product':'M'},
                {'Date': '07/17/2016', 'ID': 1, 'product':'Z'},
                {'Date': '07/17/2016', 'ID': 2, 'product':'O'},
                {'Date': '07/17/2016', 'ID': 3, 'product':'T'},])
dat.Date = pd.to_datetime(dat.Date, format="%m/%d/%Y")

print(dat)
         Date  ID product
0  2016-07-01   1       A
1  2016-07-01   2       B
2  2016-07-01   3       D
3  2016-07-04   2       F
4  2016-07-05   1       G
5  2016-07-09   1       K
6  2016-07-09   2       L
7  2016-07-09   3       X
8  2016-07-11   1       N
9  2016-07-13   3       M
10 2016-07-17   1       Z
11 2016-07-17   2       O
12 2016-07-17   3       T

Now, use a helper function to

calculate the time difference between Dates (per ID)
calculate the week (0-7 days is week 1, 7-14 days is week 2, etc.)

Then, within each group (use groupby), use these 2 helper functions to calculate the time and week. Since you require only the first entry of each week (7-days), do a second groupby and retrieve the first record.

Here is the code to calculate the time and week as separate columns

code to calculate num_weeks is from this SO post

dat = dat.sort_values(by=['ID','Date'])
dat['time'] = pd.Series()
dat['week'] = pd.Series([1]*len(dat))

def calc_time(df):
    df.loc[:,'time'] = df['Date'] - df.loc[df.index.min(), 'Date']
    num_weeks = (
        int(np.ceil((dat['Date'].max() - \
            dat['Date'].min())/np.timedelta64(1, 'W')))
                )
    df = calc_week(df, num_weeks) # num_weeks = 3
    return df

def calc_week(df, num_weeks):
    for week in range(1, num_weeks+1):
        # print(str(week*7), str((week+1)*7))
        df.loc[\
            (df['time'] > str(week*7)+' days') & \
            (df['time'] <= str((week+1)*7)+' days'), \
        'week'] = week+1
    return df

dat = dat.groupby(['ID']).apply(calc_time)

Here is the output of the above step

print(dat)
         Date  ID product    time  week
0  2016-07-01   1       A  0 days     1
4  2016-07-05   1       G  4 days     1
5  2016-07-09   1       K  8 days     2
8  2016-07-11   1       N 10 days     2
10 2016-07-17   1       Z 16 days     3
1  2016-07-01   2       B  0 days     1
3  2016-07-04   2       F  3 days     1
6  2016-07-09   2       L  8 days     2
11 2016-07-17   2       O 16 days     3
2  2016-07-01   3       D  0 days     1
7  2016-07-09   3       X  8 days     2
9  2016-07-13   3       M 12 days     2
12 2016-07-17   3       T 16 days     3

Now, the second groupby and sort to get your final desired output

dat = dat.groupby(['ID','week']).first().reset_index(drop=False)
dff = (
    dat[['Date','ID','product']].sort_values(by=['Date','ID'])
                                .reset_index(drop=True)
    )

print(dff)
        Date  ID product
0 2016-07-01   1       A
1 2016-07-01   2       B
2 2016-07-01   3       D
3 2016-07-09   1       K
4 2016-07-09   2       L
5 2016-07-09   3       X
6 2016-07-17   1       Z
7 2016-07-17   2       O
8 2016-07-17   3       T

Original Attempt

Generate some data in same format as OP

idx = pd.date_range('2018-04-01', '2018-05-01', freq='1D') + \
      pd.DateOffset(days=16)
df = pd.DataFrame(idx, columns=['Date'])
df.set_index('Date', inplace=True)
df['ID'] = range(len(idx))
df['product'] = range(12,12+len(idx))

print(df)
            ID  product
Date                   
2018-04-17   0       12
2018-04-18   1       13
...
2018-05-16  29       41
2018-05-17  30       42

Use pd.data_range to pick out only the required intervals

idx = pd.date_range(df.index[0], df.index[-1], freq='7D')
df_seven_days = df.loc[idx].reset_index(drop=False)
df_seven_days.rename(columns={'index': 'Date'}, inplace=True)

print(df_seven_days)
      Date  ID  product
2018-04-17   0       12
2018-04-24   7       19
2018-05-01  14       26
2018-05-08  21       33
2018-05-15  28       40