Search code examples
pythonpandasnumpypython-xarraynetcdf

How do I extract all the values from one variable in a NetCDF file with Python?


I am trying to use the CHESS-SCAPE dataset (found here - https://data.ceda.ac.uk/badc/deposited2021/chess-scape/data/rcp60/01/monthly) to get the surface wind speeds across the UK throughout the entire date range (1980-2080) with Python. Ideally I would like the monthly speed data across every grid point.

The file is a NetCDF file, and I am really struggling to use it. I have tried lots of things, but to keep this question simple, I will show an example of what I'm doing with Python to access just one slice of the data:

import pandas as pd
import numpy as np
import xarray as xr

data = xr.open_dataset('chess-scape.nc', chunks={})

wind_speed = data['sfcWind']

df = pd.DataFrame(wind_speed.isel(time=1199).values)

df.to_csv('windspeed.csv')

data.close()

The result seems to be random, sporadic values in the csv file, so I presume I'm not accessing the data correctly. I have tried accessing the values directly like so:

df = pd.DataFrame(wind_speed.values[1199])

But I think the entirety of the dataset is loaded into memory during that process, so RAM fills up very quickly. An example of the CSV file produced:

CSV output

I have an existing dataset I'm currently using which is a .dat file, and I'm at a bit of a loss as to how to get data from this NetCDF to make it look remotely like this file:

.dat file dataset

I realise this question is perhaps worded poorly and may indeed be the wrong question, so any direction at this point would be appreciated.


Solution

  • There's nothing wrong with your approach. I guess, the confusion comes from the fact that the actual data (wind speed over the UK) is surrounded by a lot of empty grid points (no data over the ocean). If you load the resulting CSV file into LibreOffice Calc (or Excel) and zoom out as far as possible, you will recognize the shape of the UK upside-down (the mouth of the River Thames is around cell VL 182).

    With xarray, you can check visually if the extracted data is as expected, using wind_speed.isel(time=1199).plot() in an interactive environment like Jupyter Lab. (If your Python environment is not interactive, you also need import matplotlib.pyplot as plt and plt.show().)

    So, maybe the question is rather: Do you really need to save the data in CSV format? In my experience, working with NetCDF data is most convenient in xarray, so I would not attempt to convert it, except for visualization purposes.

    As a side note, I would suggest writing your code as follows (just a few characters less):

    import numpy as np
    import xarray as xr
    import pandas as pd
    
    
    ds = xr.open_dataset('chess-scape.nc')
    wind_speed = ds.sfcWind
    
    df = pd.DataFrame(wind_speed.isel(time=1199))
    df.to_csv('windspeed.csv')