Search code examples
pythonpython-3.xnetcdfgdal

How to retrieve all variable names within a netcdf using GDAL


I am struggling to find a way to retrieve metadata information from a FILE using GDAL. Specifically, I would like to retrieve the band names and the order in which they are stored in a given file (may that be a GEOTIFF or a NETCDF).

For instance, if we follow the description within the GDAL documentation, we have the "GetMetaData" method from the gdal.Dataset (see here and here). Despite this method returning a whole set of information regarding the dataset, it does not provide the band names and the order that they are stored within the given FILE. As a matter of fact, it seems to be an old problem (from 2015) that seems not to be solved yet (more info here). As it seems, "R" language has already solved this problem (see here), though Python hasn't.

Just to be thorough here, I know that there are other Python packages that can help in this endeavour (e.g., xarray, rasterio, etc.); nevertheless, it would be important to be concise with the set of packages that one should use in a single script. Therefore, I would like to know a definite way to find the band (a.k.a., variable) names and the order they are stored within a single FILE using gdal.

Please, let me know your thoughs in this regard.

Below, I present a starting point for solving this Issue, in which a file is opened by GDAL (creating a Dataset object).

from gdal import Dataset
from osgeo import gdal

OpeneddatasetFile = gdal.Open(f'NETCDF:{input}/{file_name}.nc:' + var)

if isinstance(OpeneddatasetFile , Dataset):
    print("File opened successfully")


# here is where one should be capable of fetching the variable (a.k.a., band) names
# of the OpeneddatasetFile.
# Ideally, it would be most welcome some kind of method that could return a dictionary 
# with this information

# something like:

# VariablesWithinFile = OpeneddatasetFile.getVariablesWithinFileAsDictionary()




Solution

  • I have finally found a way to retrieve variable names from the NETCDF file using GDAL, and that is thank's to the comments given by Robert Davy above.

    I have organized the code into a set of functions to help its visualization. Notice that there is also a function for reading metadata from the NETCDF, which returns this info in a dictionary format (see the "readInfo" function).

    from gdal import Dataset, InfoOptions
    from osgeo import gdal
    import numpy as np
    
    
    def read_data(filename):
    
        dataset = gdal.Open(filename)
    
        if not isinstance(dataset, Dataset):
            raise FileNotFoundError("Impossible to open the netcdf file")
    
        return dataset
    
    
    def readInfo(ds, infoFormat="json"):
        "how to: https://gdal.org/python/"
    
        info = gdal.Info(ds, options=InfoOptions(format=infoFormat))
    
        return info
    
    
    def listAllSubDataSets(infoDict: dict):
    
        subDatasetVariableKeys = [x for x in infoDict["metadata"]["SUBDATASETS"].keys()
                                  if "_NAME" in x]
    
        subDatasetVariableNames = [infoDict["metadata"]["SUBDATASETS"][x]
                                   for x in subDatasetVariableKeys]
    
        formatedsubDatasetVariableNames = []
    
        for x in subDatasetVariableNames:
    
            s = x.replace('"', '').split(":")[-1]
            s = ''.join(s)
            formatedsubDatasetVariableNames.append(s)
    
        return formatedsubDatasetVariableNames
    
    
    if "__main__" == __name__:
    
        filename = "netcdfFile.nc"
        ds = read_data(filename)
    
        infoDict = readInfo(ds)
    
        infoDict["VariableNames"] = listAllSubDataSets(infoDict)