Good morning, I have a problem when reading a large netCDF file in python, which contains meteorological information, that information must go through it to assemble the information and then insert it into the database, but the time it takes to go through and assemble the information is too much, I know there must be other ways to perform the same process more efficiently, currently I access the information through a for loop, below the code
content = nc.Dataset(pathFile+file)
XLONG, XLAT = content.variables["XLONG"], content.variables["XLAT"]
Times = content.variables["Times"] #Horas formar b 'b
RAINC = content.variables["RAINC"] #Lluvia
Q2 = content.variables["Q2"] #Humedad especifica
T2 = content.variables["T2"] #Temperatura
U10 = content.variables["U10"] #Viento zonal
V10 = content.variables["V10"] #Viento meridional
SWDOWN = content.variables["SWDOWN"] #Radiacion incidente
PSFC = content.variables["PSFC"] #Presion de la superficie
SST = content.variables["SST"] #Temperatura de la superficie del mar
CLDFRA = content.variables["CLDFRA"] #Fraccion de nubes
for c2 in range(len(XLONG[0])):
for c3 in range(len(XLONG[0][c2])):
position += 1
for hour in range(len(Times)):
dateH = getDatetimeInit(dateFormatFile.hour) if hour == 0 else getDatetimeForHour(hour, dateFormatFile.hour)
hourUTC = getHourUTC(hour)
RAINH = str(RAINC[hour][0][c2][c3])
Q2H = str(Q2[hour][0][c2][c3])
T2H = str(convertKelvinToCelsius(T2[hour][0][c2][c3]))
U10H = str(U10[hour][0][c2][c3])
V10H = str(V10[hour][0][c2][c3])
SWDOWNH = str(SWDOWN[hour][0][c2][c3])
PSFCH = str(PSFC[hour][0][c2][c3])
SSTH = str(SST[hour][0][c2][c3])
CLDFRAH = str(CLDFRA[hour][0][c2][c3] )
rowData = [idRun, functions.IDMODEL, idTime, position, dateH.year, dateH.month, dateH.day, dateH.hour, RAINH, Q2H, T2H, U10H, V10H, SWDOWNH, PSFCH, SSTH, CLDFRAH]
dataProcess.append(rowData)
I would use NumPy. Let us assume you have netCDF with 2 variables, "t2" and "slp". Then you could use the following code to vectorize your data:
#!//usr/bin/env ipython
# ---------------------
import numpy as np
from netCDF4 import Dataset
# ---------------------
filein = 'test.nc'
ncin = Dataset(filein);
tair = ncin.variables['t2'][:];
slp = ncin.variables['slp'][:];
ncin.close();
# -------------------------
tairseries = np.reshape(tair,(np.size(tair),1));
slpseries = np.reshape(slp,(np.size(slp),1));
# --------------------------
## if you want characters:
#tairseries = np.array([str(val) for val in tairseries]);
#slpseries = np.array([str(val) for val in slpseries]);
# --------------------------
rowdata = np.concatenate((tairseries,slpseries),axis=1);
# if you want characters, do this in the end:
row_asstrings = [[str(vv) for vv in val] for val in rowdata]
# ---------------------------
Nevertheless, I have a feeling that using strings is not very good idea. In my example, the conversion from numerical arrays to strings, took quite long time and therefore I did not implement it before concatenation.
If you want also some time/location information, you can do like this:
#!//usr/bin/env ipython
# ---------------------
import numpy as np
from netCDF4 import Dataset
# ---------------------
filein = 'test.nc'
ncin = Dataset(filein);
xin = ncin.variables['lon'][:]
yin = ncin.variables['lat'][:]
timein = ncin.variables['time'][:]
tair = ncin.variables['t2'][:];
slp = ncin.variables['slp'][:];
ncin.close();
# -------------------------
tairseries = np.reshape(tair,(np.size(tair),1));
slpseries = np.reshape(slp,(np.size(slp),1));
# --------------------------
## if you want characters:
#tairseries = np.array([str(val) for val in tairseries]);
#slpseries = np.array([str(val) for val in slpseries]);
# --------------------------
rowdata = np.concatenate((tairseries,slpseries),axis=1);
# if you want characters, do this in the end:
#row_asstrings = [[str(vv) for vv in val] for val in rowdata]
# ---------------------------
# =========================================================
nx = np.size(xin);ny = np.size(yin);ntime = np.size(timein);
xm,ym = np.meshgrid(xin,yin);
xmt = np.tile(xm,(ntime,1,1));ymt = np.tile(ym,(ntime,1,1))
timem = np.tile(timein[:,np.newaxis,np.newaxis],(1,ny,nx));
xvec = np.reshape(xmt,(np.size(tair),1));yvec = np.reshape(ymt,(np.size(tair),1));timevec = np.reshape(timem,(np.size(tair),1)); # to make sure that array's size match, I am using the size of one of the variables
rowdata = np.concatenate((xvec,yvec,timevec,tairseries,slpseries),axis=1);
In any case, with variable sizes (744,150,150), it took less than 2 seconds to vectorize 2 variables.