Search code examples
listpandasperiod

Date periods based on first occurence


I have a pandas data frame of orders:

OrderID OrderDate   Value   CustomerID
1       2017-11-01  12.56   23
2       2017-11-06  1.56    23
3       2017-11-08  2.67    23
4       2017-11-12  5.67    99
5       2017-11-13  7.88    23
6       2017-11-19  3.78    99

Let's look at customer with ID 23. His first order in the history was 2017-11-01. This date is a start date for his first week. It means that all his orders between 2017-11-01 and 2017-11-07 are assigned to his week number 1 (It IS NOT a calendar week like Monday to Sunday). For customer with ID 99 first week starts 2017-11-12 of course as it is a date of his first order (OrderId 6).

I need to assign every order of the table to the respective index of the common table Periods. Periods[0] will contain orders from customer's weeks number 1, Periods[1] from customer's weeks number 2 etc. OrderId 1 nad OrderId 6 will be in the same index of Periods table as both orders were created in first week of their customers.

Period table containig orders IDs has to look like this: Periods=[[1,2,4],[3,5,6]]


Solution

  • Is this what you want ?

    df['New']=df.groupby('CustomerID').OrderDate.apply(lambda x : (x-x.iloc[0]).dt.days//7)
    df.groupby('New').OrderID.apply(list)
    Out[1079]: 
    New
    0    [1, 2, 4]
    1    [3, 5, 6]
    Name: OrderID, dtype: object
    

    To get your period table

    df.groupby('New').OrderID.apply(list).tolist()
    Out[1080]: [[1, 2, 4], [3, 5, 6]]
    

    More info

    df
    Out[1081]: 
       OrderID  OrderDate  Value  CustomerID  New
    0        1 2017-11-01  12.56          23    0
    1        2 2017-11-06   1.56          23    0
    2        3 2017-11-08   2.67          23    1
    3        4 2017-11-12   5.67          99    0
    4        5 2017-11-13   7.88          23    1
    5        6 2017-11-19   3.78          99    1