Search code examples
pythonnumpynetcdf4

Numpy 3D array (NetCDF data) slicing same element - the fastest way


I need to slice the same element in 3D numpy array (actually masked array, but works the same). I usually do it with iterations - however current data is so huge and it needs repeating the process on thousands of datasets - it will take weeks (raw estimation). What is the quickest way to slice 3D array without looping through all 2D arrays?

In this simple example I need to slice [1, 0] element in each 2D array which is 3 in all 2D arrays and store them in result array.

NetCDF example (slicing element [500, 400])

import netCDF4

url = "http://eip.ceh.ac.uk/thredds/dodsC/public-chess/PET/aggregation/PETAggregation.ncml"
dataset = netCDF4.Dataset(url)

result = dataset.variables['pet'][:, 500, 400]

myarray SUPERSEDED NOW WITH ABOVE

myarray = np.array([
    [[1, 2], [3, 4], [5, 6]],
    [[1, 2], [3, 4], [5, 6]],
    [[1, 2], [3, 4], [5, 6]],
    [[1, 2], [3, 4], [5, 6]],
])

result = []
for i in myarray:
    result.append(i[1][0])

result [3, 3, 3, 3]

EDIT FirefoxMetzger suggested to slice it simply with result = myarray[:, 1, 0]. However, I'm getting the following error message with this:

RuntimeError: NetCDF: DAP server error


Solution

  • The minimal numpy example you provided can be efficiently sliced using standard slicing mechanisms:

    myarray = np.array([
        [[1, 2], [3, 4], [5, 6]],
        [[1, 2], [3, 4], [5, 6]],
        [[1, 2], [3, 4], [5, 6]],
        [[1, 2], [3, 4], [5, 6]],
    ])
    
    result = myarray[:, 1, 0]
    

    The NetCFD seems to come from the resulting slice being too large to be returned from the server, causing a crash. As per your comment, the solution here is to query the server in chunks and aggregate the results locally.