Search code examples
pythonpandasdataframenumpypandas-groupby

Python Pandas - Difference between groupby keys with repeated valyes


I have some data with dates of sales to my clients. The data looks like this:

Cod client Items Date
0 100 1 2022/01/01
1 100 7 2022/01/01
2 100 2 2022/02/01
3 101 5 2022/01/01
4 101 8 2022/02/01
5 101 10 2022/02/01
6 101 2 2022/04/01
7 101 2 2022/04/01
8 102 4 2022/02/01
9 102 10 2022/03/01

What I'm trying to acomplish is to calculate the differences beetween dates for each client: grouped first by "Cod client" and after by "Date" (because of the duplicates)

The expected result is like:

Cod client Items Date Date diff Explain
0 100 1 2022/01/01 NaT First date for client 100
1 100 7 2022/01/01 NaT ...repeat above
2 100 2 2022/02/01 31 Diff from first date 2022/01/01
3 101 5 2022/01/01 NaT Fist date for client 101
4 101 8 2022/02/01 31 Diff from first date 2022/01/01
5 101 10 2022/02/01 31 ...repeat above
6 101 2 2022/04/01 59 Diff from previous date 2022/02/01
7 101 2 2022/04/01 59 ...repeat above
8 102 4 2022/02/01 NaT First date for client 102
9 102 10 2022/03/01 28 Diff from first date 2022/02/01

I already tried doing df["Date diff"] = df.groupby("Cod client")["Date"].diff() but it considers the repeated dates and return zeroes for then

I appreciate for help!


Solution

  • IIUC you can combine several groupby operations:

    # ensure datetime
    df['Date'] = pd.to_datetime(df['Date'])
    
    # set up group
    g = df.groupby('Cod client')
    
    # identify duplicated dates per group
    m = g['Date'].apply(pd.Series.duplicated)
    
    # compute the diff, mask and ffill
    df['Date diff'] = g['Date'].diff().mask(m).groupby(df['Cod client']).ffill()
    

    output:

       Cod client  Items       Date Date diff
    0         100      1 2022-01-01       NaT
    1         100      7 2022-01-01       NaT
    2         100      2 2022-02-01   31 days
    3         101      5 2022-01-01       NaT
    4         101      8 2022-02-01   31 days
    5         101     10 2022-02-01   31 days
    6         101      2 2022-04-01   59 days
    7         101      2 2022-04-01   59 days
    8         102      4 2022-02-01       NaT
    9         102     10 2022-03-01   28 days