Search code examples
pythonpandasdataframedatetimegroup-by

What is the best way to find frequency of data receival?


I'm trying to determine reporting period from a dataframe of reports. Options for reporting period would be:

  • Monthly
  • Quarterly
  • Other

The dataframe looks like this:

data = [['F000003Y6H', '2018-07-31'], ['F000003Y6H', '2018-08-31'], 
        ['F000003Y6H', '2018-09-30'], ['F000003Y6H', '2018-10-31'], 
        ['FOUSA06C1Y', '2021-07-31'], ['FOUSA06C1Y', '2021-10-31'], 
        ['FOUSA06C1Y', '2022-1-31'], ['FOUSA06C1Y', '2022-04-30'], 
        ['FOUSA06C1Y', '2022-7-31'], ['FO0005TYH7', '2019-5-31'], 
        ['FO0005TYH7', '2019-6-30'], ['FO0005TYH7', '2019-7-31'], 
        ['FO0005TYH7', '2019-10-31'], ['FO0005TYH7', '2020-1-31']]

df = pd.DataFrame(data, columns=['class_id', 'report_date'])

To determine frequency of reports(for each class_id), I have this:

df['report_date'] = pd.to_datetime(df['report_date'])
df['datediff_days'] = df.groupby(['class_id'])['report_date'].diff()

class_ids = df['class_id'].unique()

def determine_period(class_id):
    new_df = df.groupby('class_id')['datediff_days'].describe()
    mean_td = new_df.at[class_id, 'mean']
    mean_days = mean_td.days
    if 28<mean_days<32:
        #monthly
        #period is number of months
        period = 1
    elif 85<mean_days<95:
        #quarterly
        period = 3
    else:
        period = 0

    return period

for x in class_ids:
   period = determine_period(x)
   print("For class: " + str(x))
   print("Period is " + str(period))

Is there a better/simpler way to do this?


Solution

  • You can try to use pandas.DatetimeIndex.inferred_freq:

    df["report_date"] = pd.to_datetime(df["report_date"])
    
    df["inferred_freq"] = df.groupby("class_id")["report_date"].transform(
        lambda x: pd.DatetimeIndex(x).inferred_freq
    )
    print(df)
    

    Prints:

          class_id report_date inferred_freq
    0   F000003Y6H  2018-07-31             M
    1   F000003Y6H  2018-08-31             M
    2   F000003Y6H  2018-09-30             M
    3   F000003Y6H  2018-10-31             M
    4   FOUSA06C1Y  2021-07-31         Q-OCT
    5   FOUSA06C1Y  2021-10-31         Q-OCT
    6   FOUSA06C1Y  2022-01-31         Q-OCT
    7   FOUSA06C1Y  2022-04-30         Q-OCT
    8   FOUSA06C1Y  2022-07-31         Q-OCT
    9   FO0005TYH7  2019-05-31           NaN
    10  FO0005TYH7  2019-06-30           NaN
    11  FO0005TYH7  2019-07-31           NaN
    12  FO0005TYH7  2019-10-31           NaN
    13  FO0005TYH7  2020-01-31           NaN