I'm trying to fill nan values in a NetCDF file (let's call is 'Target' file) by getting the values from another NetCDf file ('Source' file). [the two example files can be downloaded from here] I was thinking of doing this in python using the following framework:
Step1- identifying the nan values in the Target file, and extracting the location (lat/long), storing in a dataframe
Step2- Extracting the corresponding values of the stored lat/long from the Source file
Step3- writing these values into the Target file
I came up with the following code:
import pandas as pd
import xarray as xr
import numpy as np
Source = xr.open_dataset("Source.nc")
Target = xr.open_dataset("Target.nc")
#Step 1
df = Target.to_dataframe()
df=df.reset_index()
df2=(df.loc[df['ET'].isin([32767,'nan'])])
#Step2
lat = df2["lat"]
lon = df2["lon"]
point_list = zip(lat,lon)
Newdf = pd.DataFrame([])
for i, j in point_list:
dsloc = Source.sel(lat=i,lon=j,method='nearest')
DT=dsloc.to_dataframe()
Newdf=Newdf.append(DT,sort=True)
there are three issues with that: 1- I don’t know how to do step three
2- The second step take forever to complete as perhaps there are many missing points
3- This is just for one time step! Using the two files.
So, I believe there might be better ways, easier and faster to do this in python or cdo/Nco… Any ideas and solutions are welcomed…thank you… Note that, the two NC files are in different spatial resolution (dimensions).
You can use Xarray's where
method for this. You really want to stay away from a python for loop if you are concerned with efficiency at all. Here's an example of how this would work:
# these are the points you want to keep
# you can fine tune this further (exclude values over a threshold)
condition = target.notnull()
# fill the values where condition is false
target_filled = target.where(condition, source)