Search code examples
pythonpandasdatetimepython-datetime

Compute daily frequency on a time series


Task:

Calculate the frequency of each ID for each month of 2021

Frequency formula: Activity period (Length of time between last activity and first activity) / (Number of activity Days - 1)

e.g. ID 1 - Month 2: Activity Period (2021-02-23 - 2021-02-18 = 5 days) / (3 active days - 1) == Frequency = 2,5

Sample:

times = [
    '2021-02-18',
    '2021-02-22',
    '2021-02-23',
    '2021-04-23',
    '2021-01-18',
    '2021-01-19',
    '2021-01-20',
    '2021-01-03',
    '2021-02-04',
    '2021-02-04'
] 

id = [1, 1, 1, 1, 44, 44, 44, 46, 46, 46]

df = pd.DataFrame({'ID':id, 'Date': pd.to_datetime(times)})

df = df.reset_index(drop=True)

print(df)

       ID       Date
0   1 2021-02-18
1   1 2021-02-22
2   1 2021-02-23
3   1 2021-04-23
4  44 2021-01-18
5  44 2021-01-19
6  44 2021-01-20
7  46 2021-01-03
8  46 2021-02-04
9  46 2021-02-04

Desired Output:

If frequency negative == 0

  id  01_2021  02_2021  03_2021  04_2021
0   1        0        2        0        0
1  44        1        0        0        0
2  46        0        0        0        0 

Solution

  • Try a pivot_table with a custom aggfunc:

    # Create Columns For Later
    dr = pd.date_range(start=df['Date'].min(),
                       end=df['Date'].max() + pd.offsets.MonthBegin(1), freq='M') \
        .map(lambda dt: dt.strftime('%m_%Y'))
    
    new_df = (
        df.pivot_table(
            index='ID',
            # Columns are dates in MM_YYYY format
            columns=df['Date'].dt.strftime('%m_%Y'),
            # Custom Agg Function
            aggfunc=lambda x: (x.max() - x.min()) /
                              pd.offsets.Day(max(1, len(x) - 1))
            # max(1, len(x) -1) to prevent divide by 0
        )
            # Fix Axis Names and Column Levels
            .droplevel(0, axis=1)
            .rename_axis(None, axis=1)
            # Reindex  to include every month from min to max date
            .reindex(dr, axis=1)
            # Clip to exclude negatives
            .clip(lower=0)
            # Fillna with 0
            .fillna(0)
            # Reset index
            .reset_index()
    )
    
    print(new_df)
    

    new_df:

       ID  01_2021  02_2021  03_2021  04_2021
    0   1      0.0      2.5      0.0      0.0
    1  44      1.0      0.0      0.0      0.0
    2  46      0.0      0.0      0.0      0.0