Convert time series data from csv to netCDF python

Main problem during this process is the code below:

precip[:] = orig

Produces an error of:

ValueError: cannot reshape array of size 5732784 into shape (39811,144,144)

I have two CSV files, one of the CSV file contains all the actual data of a variable (precipitation), with each column as a station, and their corresponding coordinates is in the second separate CSV file. My sample data is in google drive here.

If you want to have a look at the data itself, but my 1st CSV file has the shape (39811, 144) and 2nd CSV file has the shape (171, 10) but note; I'm only using the sliced dataframe as (144, 2).

This is the code:

stations = pd.read_csv(stn_precip)
stncoords = stations.iloc[:,[0,1]][:144]
orig = pd.read_csv(orig_precip, skiprows = 1, names = stations['Code'][:144])

lons = stncoords['X']
lats = stncoords['Y']

ncout = netCDF4.Dataset('Precip_1910-2018_homomod.nc', 'w')

ncout.createDimension('longitude',lons.shape[0])
ncout.createDimension('latitude',lats.shape[0])
ncout.createDimension('precip',orig.shape[1])
ncout.createDimension('time',orig.shape[0])

lons_out = lons.tolist()
lats_out = lats.tolist()
time_out = orig.index.tolist()

lats = ncout.createVariable('latitude',np.dtype('float32').char,('latitude',))
lons = ncout.createVariable('longitude',np.dtype('float32').char,('longitude',))
time = ncout.createVariable('time',np.dtype('float32').char,('time',))
precip = ncout.createVariable('precip',np.dtype('float32').char,('time', 'longitude','latitude'))

lats[:] = lats_out
lons[:] = lons_out
time[:] = time_out
precip[:] = orig
ncout.close()

I'm mostly basing my code to this post: convert-csv-to-netcdf but does not include the variable 'TIME' as a 3rd dimension, so that's where I'm failing. I think I should be expecting the precipitation variable to have a shape in the form (39811, 144, 144), but the error suggests otherwise.

Not exactly sure how to deal with this, any inputs are appreciated.

Solution

As you have data from different stations, I would suggest using dimension station for your netCDF file and not separate lon and lat. Of course, you can save the longitude and latitude of each station to separate variable.

Here is one possible solution, using your code as an example:

#!/usr/bin/env ipython
import pandas as pd
import numpy as np
import netCDF4

stn_precip='Precip_1910-2018_stations.csv'
orig_precip='Precip_1910-2018_origvals.csv'
stations = pd.read_csv(stn_precip)
stncoords = stations.iloc[:,[0,1]][:144]
orig = pd.read_csv(orig_precip, skiprows = 1, names = stations['Code'][:144])

lons = stncoords['X']
lats = stncoords['Y']
nstations = np.size(lons)

ncout = netCDF4.Dataset('Precip_1910-2018_homomod.nc', 'w')

ncout.createDimension('station',nstations)
ncout.createDimension('time',orig.shape[0])

lons_out = lons.tolist()
lats_out = lats.tolist()
time_out = orig.index.tolist()

lats = ncout.createVariable('latitude',np.dtype('float32').char,('station',))
lons = ncout.createVariable('longitude',np.dtype('float32').char,('station',))
time = ncout.createVariable('time',np.dtype('float32').char,('time',))
precip = ncout.createVariable('precip',np.dtype('float32').char,('time', 'station'))

lats[:] = lats_out
lons[:] = lons_out
time[:] = time_out
precip[:] = orig
ncout.close()

So the information about output file (ncdump -h Precip_1910-2018_homomod.nc) is like this: