python-3.x large-data python-xarray netcdf4

Visualize and filter large NetCDF file using logic

I have a very large dataset in a NetCDF file.

RZSC = xr.open_dataset('/home/chandra/data/RZSC_250m_SA.nc')
RZSC = RZSC.Band1
RZSC
[Output]:
<xarray.DataArray 'Band1' (lat: 32093, lon: 20818)>
[668112074 values with dtype=float32]
Coordinates:
  * lat      (lat) float64 -58.36 -58.36 -58.35 -58.35 ... 13.71 13.71 13.71
  * lon      (lon) float64 -81.38 -81.37 -81.37 -81.37 ... -34.63 -34.63 -34.62
Attributes:
    long_name:     GDAL Band Number 1
    grid_mapping:  crs
########################
Treecover = xr.open_dataset('/home/chandra/data/Treecover_MOD44B_2000_250m_AMAZON.nc')
Treecover = Treecover.Band1
Treecover
[Output]:
<xarray.DataArray 'Band1' (lat: 32093, lon: 20818)>
[668112074 values with dtype=float64]
Coordinates:
  * lat      (lat) float64 -58.36 -58.36 -58.35 -58.35 ... 13.71 13.71 13.71
  * lon      (lon) float64 -81.38 -81.37 -81.37 -81.37 ... -34.63 -34.63 -34.62
Attributes:
    long_name:     GDAL Band Number 1
    grid_mapping:  crs
####
np.nanmax(Treecover[:,:])
[Output]: 85.0625
np.nanmin(Treecover[:,:])
[Output]: 0.0

I am neither able to visualize the dataset or filter the dataset using any command like RZSC[:,:].where(Treecover[:,:] > 1000).shape which is quite frustrating (as the output is (32093, 20818), same as the original array size).

Does anyone have any suggestion for this? I was not able to share the data as the size of the netcdf file is > 6 GB.

Solution

xr.where() will always return the same size array that you feed it. Did you try visualizing it? It should set all of the indices where the condition is false to NA. You can manually set it to whatever you want as well:

RZSC.where(Treecover > 1000, Treecover, np.NaN)