I'm fairly new to Python and I'm trying to calculate the minimum, average and maximum monthly temperature from daily data for February.
I'm having a bit of trouble applying my code from other months to February.
Here is my code for the 31-day months :
import xarray as xr
import numpy as np
import copernicusmarine
DS = copernicusmarine.open_dataset(dataset_id="cmems_mod_glo_phy_my_0.083deg_P1D-m", minimum_longitude = -1.68, maximum_longitude = -1.56, minimum_latitude = 49.63, maximum_latitude = 49.67, minimum_depth = 0, maximum_depth = 0)
var_arr = np.zeros((341,len(DS['depth']),len(DS['latitude']),len(DS['longitude'])))
ind_time = -1
for y in range(2010,2021):
ind_time += 1
print(y)
start_rangedate = "%s"%y+"-01-01"
end_rangedate = "%s"%y+"-01-31"
subset_thetao = DS.thetao.sel(time = slice(start_rangedate, end_rangedate))
var_arr[31*ind_time:31*(ind_time+1),:,:,:] = subset_thetao.data
minimum = np.nanmin(var_arr)
print(minimum)
moyenne = np.mean(var_arr)
print(moyenne)
maximum = np.nanmax(var_arr)
print(maximum)
# 31 * 11 (years) = 341
It works well.
For February I first tried this :
DS = copernicusmarine.open_dataset(dataset_id="cmems_mod_glo_phy_my_0.083deg_P1D-m", minimum_longitude = -1.68, maximum_longitude = -1.56, minimum_latitude = 49.63, maximum_latitude = 49.67, minimum_depth = 0, maximum_depth = 0)
years_feb_28 = [2010,2011,2013,2014,2015,2017,2018,2019]
years_feb_29 = [2012,2016,2020]
var_arr = np.zeros((311,len(DS['depth']),len(DS['latitude']),len(DS['longitude'])))
ind_time_28 = -1
ind_time_29 = -1
for y in range(2010,2021):
print(y)
start_rangedate = "%s"%y+"-02-01"
if y in years_feb_28:
ind_time_28 += 1
end_rangedate = "%s"%y+"-02-28"
subset_thetao1 = DS.thetao.sel(time = slice(start_rangedate, end_rangedate))
var_arr[28*ind_time_28:28*(ind_time_28+1),:,:,:] = subset_thetao1.data
if y in years_feb_29:
ind_time_29 += 1
end_rangedate = "%s"%y+"-02-29"
subset_thetao2 = DS.thetao.sel(time = slice(start_rangedate, end_rangedate))
var_arr[29*ind_time_29:29*(ind_time_29+1),:,:,:] = subset_thetao2.data
minimum = np.nanmin(var_arr)
print(minimum)
maximum = np.nanmax(var_arr)
print(maximum)
moyenne = np.mean(var_arr)
print(moyenne)
# (8 x 28) + (3 x 29) = 311
It works, but the values seem wrong to me.
The result is :
minimum : 0.0
mean : 10.118808567523956
maximum :6.510576634161725
I tried with a single ind_time.
DS = copernicusmarine.open_dataset(dataset_id="cmems_mod_glo_phy_my_0.083deg_P1D-m", minimum_longitude = -1.68, maximum_longitude = -1.56, minimum_latitude = 49.63, maximum_latitude = 49.67, minimum_depth = 0, maximum_depth = 0)
years_feb_28 = [2010,2011,2013,2014,2015,2017,2018,2019]
years_feb_29 = [2012,2016,2020]
var_arr = np.zeros((311,len(DS['depth']),len(DS['latitude']),len(DS['longitude'])))
ind_time = -1
for y in range(2010,2021):
print(y)
start_rangedate = "%s"%y+"-02-01"
if y in years_feb_28:
ind_time += 1
end_rangedate = "%s"%y+"-02-28"
subset_thetao1 = DS.thetao.sel(time = slice(start_rangedate, end_rangedate))
var_arr[28*ind_time:28*(ind_time+1),:,:,:] = subset_thetao1.data
if y in years_feb_29:
ind_time += 1
end_rangedate = "%s"%y+"-02-29"
subset_thetao2 = DS.thetao.sel(time = slice(start_rangedate, end_rangedate))
var_arr[29*ind_time:29*(ind_time+1),:,:,:] = subset_thetao2.data
minimum = np.nanmin(var_arr)
print(minimum)
maximum = np.nanmax(var_arr)
print(maximum)
moyenne = np.mean(var_arr)
print(moyenne)
But I get this error message without understanding where the value 21 comes from :
Cell In[7], line 38
var_arr[29*ind_time:29*(ind_time+1),:,:,:] = subset_thetao2.data
ValueError: could not broadcast input array from shape (29,1,1,2) into shape (21,1,1,2)
Someone told me that the data taken into account may stop at year-02-28 T:00:00:00 (for 28-day years) and year-02-29 T:00:00:00 (for 29-day years) and that the code doesn't take the last day into account. So I tried to extend the end_rangedate to year-03-01 but I get this :
Cell In[8], line 33
var_arr[28*ind_time:28*(ind_time+1),:,:,:] = subset_thetao1.data
ValueError: could not broadcast input array from shape (29,1,1,2) into shape (28,1,1,2)
Could someone explain to me what I'm doing wrong ?
As I said in my comments, the problems in your different attempts come from the indexes you use for var_arr. In the 1st case, with 2 different ind_time_.. indexes, the data is superposed at the start of var_arr, like in the following figure; this both causes lost data and many zeroes left at the end of the array, which affects the minimum and average. In the 2nd case, the same index is used for 28-day and 29-days months, which creates an offset between months for leap and non leap years, causing both superpositions and gaps (see the rough figure below); but the main problem is that too many "slots" (for days) are consumed, which explains the 8 missing days for feb 2020. Here's a fix consisting of calculating for each year the start and end indexes:
DS = copernicusmarine.open_dataset(dataset_id="cmems_mod_glo_phy_my_0.083deg_P1D-m", minimum_longitude = -1.68, maximum_longitude = -1.56, minimum_latitude = 49.63, maximum_latitude = 49.67, minimum_depth = 0, maximum_depth = 0)
years_feb_28 = [2010,2011,2013,2014,2015,2017,2018,2019]
years_feb_29 = [2012,2016,2020]
var_arr = np.zeros((311,len(DS['depth']),len(DS['latitude']),len(DS['longitude'])))
end_index = 0
for y in range(2010,2021):
print(y)
start_index = end_index
start_rangedate = "%s"%y+"-02-01"
feb_days = 28 + (year % 4 == 0 and year % 100 != 0) or (year % 400 == 0)
end_index = start_index + 28
end_rangedate = "%s"%y+"-02-28"
if y in years_feb_29:
end_index = start_index + 29
end_rangedate = "%s"%y+"-02-29"
subset_thetao = DS.thetao.sel(time = slice(start_rangedate, end_rangedate))
var_arr[start_index:end_index,:,:,:] = subset_thetao.data
minimum = np.nanmin(var_arr)
print(minimum)
maximum = np.nanmax(var_arr)
print(maximum)
moyenne = np.mean(var_arr)
print(moyenne)
And a shorter version getting rid of the if ... else
:
DS = copernicusmarine.open_dataset(dataset_id="cmems_mod_glo_phy_my_0.083deg_P1D-m", minimum_longitude = -1.68, maximum_longitude = -1.56, minimum_latitude = 49.63, maximum_latitude = 49.67, minimum_depth = 0, maximum_depth = 0)
var_arr = np.zeros((311,len(DS['depth']),len(DS['latitude']),len(DS['longitude'])))
end_index = 0
for y in range(2010,2021):
print(y)
start_index = end_index
feb_days = 28 + ((y % 4 == 0 and y % 100 != 0) or (y % 400 == 0))
start_rangedate = "%s"%y+"-02-01"
end_index = start_index + feb_days
end_rangedate = f"{y}-02-{feb_days}"
subset_thetao = DS.thetao.sel(time = slice(start_rangedate, end_rangedate))
var_arr[start_index:end_index,:,:,:] = subset_thetao.data
minimum = np.nanmin(var_arr)
print(minimum)
maximum = np.nanmax(var_arr)
print(maximum)
moyenne = np.mean(var_arr)
print(moyenne)