Search code examples
pythonpandasmemorynetcdf4

NetCDF4 file with Python - Filter before dataframing


Due to a large NetCDF4 file, I get a MemoryError when I want to transform it into Pandas dataframe. But I don't need everything from the netCDF4 file, so I wanted to know if I could cut the file priorly, and after transforming into dataframe

My file looks like this: enter image description here

xr is for the xarray library Time variable contains all hours from 2019-01-01 to 2019-01-31 Unfortunately I can't filter on Copernicus website but I only need time at 09:00:00

Do you know how I could do it? Using xarray library or other way.

Thanks


Solution

  • You can use sel to filter your dataset:

    import pandas as pd
    import xarray as xr
    import datetime
    
    # Load a demo dataset
    ds = xr.tutorial.load_dataset('air_temperature')
    
    # Keep only 12:00 rows
    df = ds.sel(time=datetime.time(12)).to_dataframe()
    

    Output:

    >>> df
                                           air
    lat  time                lon              
    75.0 2013-01-01 12:00:00 200.0  242.299988
                             202.5  242.199997
                             205.0  242.299988
                             207.5  242.500000
                             210.0  242.889999
    ...                                    ...
    15.0 2014-12-31 12:00:00 320.0  296.889984
                             322.5  296.589996
                             325.0  295.690002
                             327.5  295.489990
                             330.0  295.190002
    
    [967250 rows x 1 columns]