Search code examples
pythonpandasiopython-itertoolsnetcdf

How to read/print the header (first 100 lines) of a netCDF file in Python?


I have been trying to read the header (first 100 lines) of a netCDF file in Python, but have been facing some issues. I am familiar with the read_nc function available in the synoptReg package for R and with the ncread function that comes with MATLAB, as well as the read_csv function available in the pandas library. To my knowledge, however, there isn't anything similar for netCDF (.nc) files.

Noting this, and using answers from this question, I've tried the following (with no success):

with open(filepath,'r') as f:
    for i in range(100):
        line = next(f).strip()
        print(line)

However, I receive this error, even though I've ensured that tabs have not been mixed with spaces and that the for statement is within the with block (as given as explanations by the top answers to this question):

'utf-8' codec can't decode byte 0xbb in position 411: invalid start byte

I've also tried the following:

with open(filepath,'r') as f:
    for i in range(100):
        line = [next(f) for i in range(100)]
print(line)

and

from itertools import islice
with open('/Users/toshiro/Desktop/Projects/CCAR/Data/EDGAR/v6.0_CO2_excl_short-cycle_org_C_2010_TOTALS.0.1x0.1.nc','r') as f:
    for i in range(100):
        line = list(islice(f, 100))
print(line)

But receive the same error as above. Are there any workarounds for this?


Solution

  • You can't. netCDFs are binary files and can't be interpreted as text.

    If the files are netCDF3 encoded, you can read them in with scipy.io.netcdf_file. But it's much more likely they are netCDF4, in which case you'll need the netCDF4 package.

    On top of this, I'd highly recommend the xarray package for reading and working with netCDF data. It supports a labeled N-dimensional array interface - think pandas indexes on each dimension of a numpy array.

    Whether you go with netCDF or xarray, netCDFs are self-describing and support arbitrary reads, so you don't need to load the whole file to view the metadata. So similar to viewing the head of a text file, you can simply do:

    import xarray as xr
    ds = xr.open_dataset("path/to/myfile.nc")
    print(ds)  # this will give you a preview of your data
    

    Additionally, xarray does have a xr.Dataset.head function which will display the first 5 (or N if you provide an int) elements along each dimension:

    ds.head()  # display a 5x5x...x5 preview of your data
    

    See the getting started guide and the User guide section on reading and writing netCDF files for more info.