Search code examples
pythonpandasdataframedatenormalize

Pandas group by selected dates


I have a dataframe that is very similar to this dataframe:

index date month
0 2019-12-1 12
1 2020-03-1 3
2 2020-07-1 7
3 2021-02-1 2
4 2021-09-1 9

And i want to combine all dates that are closest to a set of months. The months need to be normalized like this:

Months Normalized month
3, 4, 5 4
6, 7, 8, 9 8
1, 2, 10, 11, 12 12

So the output will be:

index date month
0 2019-12-1 12
1 2020-04-1 4
2 2020-08-1 8
3 2020-12-1 12
4 2021-08-1 8

Solution

  • You can iterate through the DataFrame and use replace to change the dates.

    import pandas as pd 
    
    df = pd.DataFrame(data={'date': ["2019-12-1", "2020-03-1", "2020-07-1", "2021-02-1", "2021-09-1"], 
                            'month': [12,3,7,2,9]})
    for index, row in df.iterrows():
        if (row['month'] in [3,4,5]):
            df['month'][index] = 4
            df["date"][index]  = df["date"][0].replace(df["date"][0][5:7],"04")
        elif (row['month'] in [6,7,8,9]):
            df['month'][index] = 8
            df["date"][index]  = df["date"][0].replace(df["date"][0][5:7],"08")
        else:
            df['month'][index] = 12
            df["date"][index]  = df["date"][0].replace(df["date"][0][5:7],"12")