Search code examples
pythonpandasdataframedatetimetime-series

How to label the date based on seasons in time-series pandas dataframe?


I have a pandas dataframe with datetime and values column as below and the third column I would like to label as indicted in the df below:

|      datetime     |  |values| | Label |
| ----------------- |  | ---- | | ----- |
|2019-09-01 00:30:00|  |0.43  | |Autumn1|
|2019-09-01 01:00:00|  |0.44  | |Autumn1|
|2019-09-01 01:30:00|  |0.49  | |Autumn1|
|       :           |  |  :   | |   :   |
|2019-12-01 00:30:00|  |0.57  | |Winter1|
|2019-12-01 01:00:00|  |0.59  | |Winter1|
|       :           |  |   :  | |   :   |
|2020-01-01 00:30:00|  |0.66  | |Winter1|
|2020-01-01 01:00:00|  |0.61  | |Winter1|
|       :           |  |   :  | |   :   |
|2020-03-01 00:30:00|  |0.45  | |Spring1|
|       :           |  |  :   | |   :   |
|2020-06-01 00:30:00|  |0.32  | |Summer1|
|       :           |  |   :  | |   :   |
|2020-09-01 00:30:00|  |0.44  | |Autumn2|
|       :           |  |  :   | |   :   |
|2020-12-01 00:30:00|  |0.51  | |Winter2|
|       :           |  |  :   | |   :   |

The seasons are winter (December2019, January2020, February2020), spring (March2020, April2020, May2020), summer (June2020, July2020, August2020) and autumn (September2020, October2020, November2020).

I tried different solutions but they are not working with winter case as it was taking January2020, February2020 and December2020 for the winter. I have a dataset from September 2019 to August 2021. I want to achieve the resulting dataframe as above.

Code Tried:

# mapping months to seasons: 1- winter, 2-spring, 3-summer, 4-autumn 
    if seasons:
        seasons = {1: 1, 2: 1, 3: 2, 4: 2, 5:2, 6:3, 7:3, 8:3, 9:4, 10:4, 11:4, 12:1} # maps months to seasons: 
        df['season'] = df['month'].map(seasons, na_action=None)
        df['winter'] = np.where(df['season'] == 1, True, False)
        df['spring'] = np.where(df['season'] == 2, True, False)
        df['summer'] = np.where(df['season'] == 3, True, False)
        df['autumn'] = np.where(df['season'] == 4, True, False)
        df['transitionperiod'] = np.where((df['season'] == 2) | (df['season'] == 4), True, False)


Solution

  • Your approach makes it very hard to have consistency. What I would suggest is to create a function that will map months to seasons, apply it and then use iterrows() to add the year-labels (1,2,3,4.....) to the seasons. Here is how you would do it (Note that my sample dataframe uses Weeks as periods, but you can change it to H (hours), D (Days), M (Months):

    import pandas as pd
    import numpy as np
    
    data = {
        'datetime': pd.date_range(start='2019-09-01 00:30:00', periods=120, freq='W'),
        'values': np.random.rand(120)
    }
    df = pd.DataFrame(data)
    
    df['datetime'] = pd.to_datetime(df['datetime'])
    
    def get_season(row):
        month = row['datetime'].month
        year = row['datetime'].year
        if month in [12, 1, 2]:
            season = 'Winter'
            if month == 12:
                season_year = year + 1  
            else:
                season_year = year
        elif month in [3, 4, 5]:
            season = 'Spring'
            season_year = year
        elif month in [6, 7, 8]:
            season = 'Summer'
            season_year = year
        elif month in [9, 10, 11]:
            season = 'Autumn'
            season_year = year
    
        return f"{season}{season_year}"
    
    df['Label'] = df.apply(get_season, axis=1)
    
    season_dict = {}
    for index, row in df.iterrows():
        season_name = ''.join([i for i in row['Label'] if not i.isdigit()])
        season_year = int(''.join([i for i in row['Label'] if i.isdigit()]))
        if season_name not in season_dict:
            season_dict[season_name] = {}
        if season_year not in season_dict[season_name]:
            season_dict[season_name][season_year] = len(season_dict[season_name]) + 1
        df.at[index, 'Label'] = f"{season_name}{season_dict[season_name][season_year]}"
    
    print(df.head(20))  
    
    

    This will give you

                  datetime    values    Label
    0  2019-09-01 00:30:00  0.729403  Autumn1
    1  2019-09-08 00:30:00  0.110374  Autumn1
    2  2019-09-15 00:30:00  0.436931  Autumn1
    3  2019-09-22 00:30:00  0.763790  Autumn1
    4  2019-09-29 00:30:00  0.477826  Autumn1
    5  2019-10-06 00:30:00  0.504394  Autumn1
    6  2019-10-13 00:30:00  0.011849  Autumn1
    7  2019-10-20 00:30:00  0.636985  Autumn1
    8  2019-10-27 00:30:00  0.075680  Autumn1
    9  2019-11-03 00:30:00  0.761398  Autumn1
    10 2019-11-10 00:30:00  0.557303  Autumn1
    11 2019-11-17 00:30:00  0.483444  Autumn1
    12 2019-11-24 00:30:00  0.380750  Autumn1
    13 2019-12-01 00:30:00  0.744743  Winter1
    14 2019-12-08 00:30:00  0.973454  Winter1
    15 2019-12-15 00:30:00  0.556385  Winter1
    16 2019-12-22 00:30:00  0.278462  Winter1
    17 2019-12-29 00:30:00  0.670835  Winter1
    18 2020-01-05 00:30:00  0.712285  Winter1
    19 2020-01-12 00:30:00  0.298958  Winter1
    

    and

    print(df[90:120])
    

    results in

                   datetime    values    Label
    90  2021-05-23 00:30:00  0.910225  Spring2
    91  2021-05-30 00:30:00  0.896874  Spring2
    92  2021-06-06 00:30:00  0.726272  Summer2
    93  2021-06-13 00:30:00  0.013616  Summer2
    94  2021-06-20 00:30:00  0.947737  Summer2
    95  2021-06-27 00:30:00  0.526867  Summer2
    96  2021-07-04 00:30:00  0.275155  Summer2
    97  2021-07-11 00:30:00  0.318016  Summer2
    98  2021-07-18 00:30:00  0.183074  Summer2
    99  2021-07-25 00:30:00  0.229309  Summer2
    100 2021-08-01 00:30:00  0.560631  Summer2
    101 2021-08-08 00:30:00  0.414141  Summer2
    102 2021-08-15 00:30:00  0.737118  Summer2
    103 2021-08-22 00:30:00  0.809873  Summer2
    104 2021-08-29 00:30:00  0.931765  Summer2
    105 2021-09-05 00:30:00  0.846052  Autumn3
    106 2021-09-12 00:30:00  0.647715  Autumn3
    107 2021-09-19 00:30:00  0.360201  Autumn3
    108 2021-09-26 00:30:00  0.730339  Autumn3
    109 2021-10-03 00:30:00  0.934964  Autumn3
    110 2021-10-10 00:30:00  0.174492  Autumn3
    111 2021-10-17 00:30:00  0.786723  Autumn3
    112 2021-10-24 00:30:00  0.667760  Autumn3
    113 2021-10-31 00:30:00  0.203475  Autumn3
    114 2021-11-07 00:30:00  0.483362  Autumn3
    115 2021-11-14 00:30:00  0.421646  Autumn3
    116 2021-11-21 00:30:00  0.221572  Autumn3
    117 2021-11-28 00:30:00  0.762433  Autumn3
    118 2021-12-05 00:30:00  0.056316  Winter3
    119 2021-12-12 00:30:00  0.340162  Winter3