I have a pandas dataframe with datetime and values column as below and the third column I would like to label as indicted in the df below:
| datetime | |values| | Label |
| ----------------- | | ---- | | ----- |
|2019-09-01 00:30:00| |0.43 | |Autumn1|
|2019-09-01 01:00:00| |0.44 | |Autumn1|
|2019-09-01 01:30:00| |0.49 | |Autumn1|
| : | | : | | : |
|2019-12-01 00:30:00| |0.57 | |Winter1|
|2019-12-01 01:00:00| |0.59 | |Winter1|
| : | | : | | : |
|2020-01-01 00:30:00| |0.66 | |Winter1|
|2020-01-01 01:00:00| |0.61 | |Winter1|
| : | | : | | : |
|2020-03-01 00:30:00| |0.45 | |Spring1|
| : | | : | | : |
|2020-06-01 00:30:00| |0.32 | |Summer1|
| : | | : | | : |
|2020-09-01 00:30:00| |0.44 | |Autumn2|
| : | | : | | : |
|2020-12-01 00:30:00| |0.51 | |Winter2|
| : | | : | | : |
The seasons are winter (December2019, January2020, February2020), spring (March2020, April2020, May2020), summer (June2020, July2020, August2020) and autumn (September2020, October2020, November2020).
I tried different solutions but they are not working with winter case as it was taking January2020, February2020 and December2020 for the winter. I have a dataset from September 2019 to August 2021. I want to achieve the resulting dataframe as above.
Code Tried:
# mapping months to seasons: 1- winter, 2-spring, 3-summer, 4-autumn
if seasons:
seasons = {1: 1, 2: 1, 3: 2, 4: 2, 5:2, 6:3, 7:3, 8:3, 9:4, 10:4, 11:4, 12:1} # maps months to seasons:
df['season'] = df['month'].map(seasons, na_action=None)
df['winter'] = np.where(df['season'] == 1, True, False)
df['spring'] = np.where(df['season'] == 2, True, False)
df['summer'] = np.where(df['season'] == 3, True, False)
df['autumn'] = np.where(df['season'] == 4, True, False)
df['transitionperiod'] = np.where((df['season'] == 2) | (df['season'] == 4), True, False)
Your approach makes it very hard to have consistency. What I would suggest is to create a function that will map months to seasons, apply it and then use iterrows()
to add the year-labels (1,2,3,4.....) to the seasons. Here is how you would do it (Note that my sample dataframe uses Weeks as periods, but you can change it to H (hours), D (Days), M (Months):
import pandas as pd
import numpy as np
data = {
'datetime': pd.date_range(start='2019-09-01 00:30:00', periods=120, freq='W'),
'values': np.random.rand(120)
}
df = pd.DataFrame(data)
df['datetime'] = pd.to_datetime(df['datetime'])
def get_season(row):
month = row['datetime'].month
year = row['datetime'].year
if month in [12, 1, 2]:
season = 'Winter'
if month == 12:
season_year = year + 1
else:
season_year = year
elif month in [3, 4, 5]:
season = 'Spring'
season_year = year
elif month in [6, 7, 8]:
season = 'Summer'
season_year = year
elif month in [9, 10, 11]:
season = 'Autumn'
season_year = year
return f"{season}{season_year}"
df['Label'] = df.apply(get_season, axis=1)
season_dict = {}
for index, row in df.iterrows():
season_name = ''.join([i for i in row['Label'] if not i.isdigit()])
season_year = int(''.join([i for i in row['Label'] if i.isdigit()]))
if season_name not in season_dict:
season_dict[season_name] = {}
if season_year not in season_dict[season_name]:
season_dict[season_name][season_year] = len(season_dict[season_name]) + 1
df.at[index, 'Label'] = f"{season_name}{season_dict[season_name][season_year]}"
print(df.head(20))
This will give you
datetime values Label
0 2019-09-01 00:30:00 0.729403 Autumn1
1 2019-09-08 00:30:00 0.110374 Autumn1
2 2019-09-15 00:30:00 0.436931 Autumn1
3 2019-09-22 00:30:00 0.763790 Autumn1
4 2019-09-29 00:30:00 0.477826 Autumn1
5 2019-10-06 00:30:00 0.504394 Autumn1
6 2019-10-13 00:30:00 0.011849 Autumn1
7 2019-10-20 00:30:00 0.636985 Autumn1
8 2019-10-27 00:30:00 0.075680 Autumn1
9 2019-11-03 00:30:00 0.761398 Autumn1
10 2019-11-10 00:30:00 0.557303 Autumn1
11 2019-11-17 00:30:00 0.483444 Autumn1
12 2019-11-24 00:30:00 0.380750 Autumn1
13 2019-12-01 00:30:00 0.744743 Winter1
14 2019-12-08 00:30:00 0.973454 Winter1
15 2019-12-15 00:30:00 0.556385 Winter1
16 2019-12-22 00:30:00 0.278462 Winter1
17 2019-12-29 00:30:00 0.670835 Winter1
18 2020-01-05 00:30:00 0.712285 Winter1
19 2020-01-12 00:30:00 0.298958 Winter1
and
print(df[90:120])
results in
datetime values Label
90 2021-05-23 00:30:00 0.910225 Spring2
91 2021-05-30 00:30:00 0.896874 Spring2
92 2021-06-06 00:30:00 0.726272 Summer2
93 2021-06-13 00:30:00 0.013616 Summer2
94 2021-06-20 00:30:00 0.947737 Summer2
95 2021-06-27 00:30:00 0.526867 Summer2
96 2021-07-04 00:30:00 0.275155 Summer2
97 2021-07-11 00:30:00 0.318016 Summer2
98 2021-07-18 00:30:00 0.183074 Summer2
99 2021-07-25 00:30:00 0.229309 Summer2
100 2021-08-01 00:30:00 0.560631 Summer2
101 2021-08-08 00:30:00 0.414141 Summer2
102 2021-08-15 00:30:00 0.737118 Summer2
103 2021-08-22 00:30:00 0.809873 Summer2
104 2021-08-29 00:30:00 0.931765 Summer2
105 2021-09-05 00:30:00 0.846052 Autumn3
106 2021-09-12 00:30:00 0.647715 Autumn3
107 2021-09-19 00:30:00 0.360201 Autumn3
108 2021-09-26 00:30:00 0.730339 Autumn3
109 2021-10-03 00:30:00 0.934964 Autumn3
110 2021-10-10 00:30:00 0.174492 Autumn3
111 2021-10-17 00:30:00 0.786723 Autumn3
112 2021-10-24 00:30:00 0.667760 Autumn3
113 2021-10-31 00:30:00 0.203475 Autumn3
114 2021-11-07 00:30:00 0.483362 Autumn3
115 2021-11-14 00:30:00 0.421646 Autumn3
116 2021-11-21 00:30:00 0.221572 Autumn3
117 2021-11-28 00:30:00 0.762433 Autumn3
118 2021-12-05 00:30:00 0.056316 Winter3
119 2021-12-12 00:30:00 0.340162 Winter3