I have a rather large netCDF file (~10 GB) which has a fill value of -1.0
When I use xarray's fillna like this:
hndl_nc = hndl_nc.fillna(0.0)
It is slow (~2 min), is there another operator which might be faster? Or perhaps, given the size of the file this is to be expected?
At ~85MB/s, this is in the ball park of typical performance for vectorized NumPy/xarray operations. I think it's unlikely you could improve on this significantly by simply using another built-in operation.
You might still be able to improve performance with some experimentation. The first thing to do is to profile and look at CPU usage to determine where the time is being spent.
.load()
), rewrite your files without compression, or try using xarray v0.9.0 or newer (currently in release candidate) with Dask distributed or multi-processing.engine='scipy'
can be faster, if you have netCDF3 filesscale_factor
/add_offset
to compress the data in int16
rather than larger float types