pandas dataframe timestamp data-analysis catboost

Assigning higher weigth to recent month observations for ML model

I have hightly imballanced dataset and I want to assign weights for my observations by months.
For instance, If my observation is in January 2022 I'll give it 1/5 and if it's March 2022 I'll give it 1/3and so on.

feature_1    date     weights
117       2016-11-12   0.015
...          ...        ...
123       2022-01-01    0.2
234       2022-01-02    0.2
...          ...  
345       2022-05-31    1.0

I'm using CatboostClassifier and I guess I can pass list of weights for all my data to weight param. So it will look smth like this

model.fit(Pool(X_train,y_train,weight=train_weight))

Problem is I can't think of elegant solution to form weights column/list.
For now, I splitted my dataframe in Months frequency like that:

g = X_train.groupby(pd.Grouper(key='date', freq='M'))
dfs = [group for _,group in g]

and made column of weights like that:

for i, df in enumerate(dfs):
    weight = []
    for val in dfs[i].iterrows():
        weight.append(1 / (len(dfs)+2 - i))
    dfs[i]['weight'] = weight

Solution

Given the following toy dataframe:

from datetime import datetime

import pandas as pd

df = pd.DataFrame(
    {
        "feature_1": [117, 123, 234, 345],
        "date": ["2016-11-12", "2022-01-01", "2022-01-02", "2022-05-31"],
    }
)

df["date"] = pd.to_datetime(df["date"])

Define a helper function to calculate weights:

def weight(current_date, previous_date):
    try:
        wgt = round(
            1
            / (
                (current_date.year - previous_date.year) * 12
                + current_date.month
                - previous_date.month
            ),
            3,
        )
    except ZeroDivisionError:
        wgt = 1
    return wgt

And so, assuming the most recent date is 31 May 2022:

df["weight"] = df["date"].apply(lambda x: weight(datetime(2022, 5, 31), x))

print(df)
# Output
   feature_1       date  weight
0        117 2016-11-12   0.015
1        123 2022-01-01   0.250
2        234 2022-01-02   0.250
3        345 2022-05-31   1.000