python matplotlib plot dimensions timeserieschart

How to plot daily data as monthly averages (for separate years)

I am trying to plot a graph to represent a monthly river discharge dataset from 1980-01-01 to 2013-12-31.

Please check out this graph

The plan is to plot "Jan Feb Mar Apr May...Dec" as the x-axis and the discharge (m3/s) as the y-axis. The actual lines on the graphs would represent the years. Alternatively, the lines on the graph would showcase monthly average (from jan to dec) of every year from 1980 to 2013.

  DAT = pd.read_excel('Modelled Discharge_UIB_1980-2013_Daily.xlsx',
                   sheet_name='Karhmong', header=None, skiprows=1,
                    names=['year', 'month', 'day', 'flow'],
                    parse_dates={ 'date': ['year', 'month', 'day'] },
                   index_col='date')

the above is to show what type of data it is


date        flow
1980-01-01  104.06
1980-01-02  103.81
1980-01-03  103.57
1980-01-04  103.34
1980-01-05  103.13
... ...
2013-12-27  105.65
2013-12-28  105.32
2013-12-29  105.00
2013-12-30  104.71
2013-12-31  104.42

because I want to compare all the years to each other so I tried the below command

DAT1980 = DAT[DAT.index.year==1980]
DAT1980

DAT1981 = DAT[DAT.index.year==1981
DAT1981

...etc

in terms of grouping the months for the x-axis I tried grouping months using the command

datmonth = np.unique(DAT.index.month)

so far all of these commands caused no error

however as I plot the graph I got this error

Graph plot command

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))

ax.plot(datmonth, DAT1980, color='purple', linestyle='--', label='1980')
ax.grid()

plt.legend()

ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
axs.set_xlim(3, 5)


axs.xaxis.set_major_formatter
fig.autofmt_xdate()
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))

which I got "ValueError: x and y must have same first dimension, but have shapes (12,) and (366, 1)" as the error

I then tried

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))

ax.plot(DAT.index.month, DAT.index.year==1980, color='purple', linestyle='--', label='1980')
ax.grid()

ax.plot(DAT.index.month, DAT.index.year==1981, color='black', marker='o', linestyle='-', label='C1981')
ax.grid()


plt.legend()

ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
#axs.set_xlim(1, 12)


axs.xaxis.set_major_formatter
fig.autofmt_xdate()
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))

and it worked better than the previous graph but still not what I wanted (please check out the graph here)

as my intention is to create a graph similar to this

I wholeheartedly appreciate any suggestion you may have! Thank you so so much and if you need any further information please do not hesitate to ask, I will reply as soon as possible.

Solution

Welcome to SO! Nice job creating a clear description of your issue and showing lots of code : )

There are a few syntax issues here and there, but the main issue I see is that you need to add a groupby/aggregation operation at some point. That is, you have daily data, but your desired plot has monthly resolution (for each year). It sounds like you want an average of the daily values for each month for each year (correct me if that is wrong).

Here is some fake data:

dr = pd.date_range('01-01-1980', '12-31-2013', freq='1D')
flow = np.random.rand(len(dr))
df = pd.DataFrame(flow, columns=['flow'], index=dr)

Looks like your example:

                flow
1980-01-01  0.751287
1980-01-02  0.411040
1980-01-03  0.134878
1980-01-04  0.692086
1980-01-05  0.671108
             ...
2013-12-27  0.683654
2013-12-28  0.772894
2013-12-29  0.380631
2013-12-30  0.957220
2013-12-31  0.864612

[12419 rows x 1 columns]

You can use groupby to get a mean for each month, using the same datetime attributes you use above (with some additional methods to help make the data easier to work with)

monthly = (df.groupby([df.index.year, df.index.month])
           .mean()
           .rename_axis(index=['year', 'month'],)
           .reset_index())

monthly has flow data for each month for each year, i.e. what you want to plot:

     year  month      flow
0    1980      1  0.514496
1    1980      2  0.633738
2    1980      3  0.566166
3    1980      4  0.553763
4    1980      5  0.537686
..    ...    ...       ...
403  2013      8  0.402805
404  2013      9  0.479226
405  2013     10  0.446874
406  2013     11  0.526942
407  2013     12  0.599161

[408 rows x 3 columns]

Now to plot an individual year, you index it from monthly and plot the flow data. I use most of your axes formatting:

# make figure
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))

# plotting for one year
sub = monthly[monthly['year'] == 1980]
ax.plot(sub['month'], sub['flow'], color='purple', linestyle='--', label='1980')

# some formatting
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
ax.set_xticks(range(1, 13))
ax.set_xticklabels(['J','F','M','A','M','J','J','A','S','O','N','D'])
ax.legend()
ax.grid()

Producing the following:

You could instead plot several years using a loop of some sort:

years = [1980, 1981, 1982, ...]
for year in years:
    sub = monthly[monthly['year'] == year]
    ax.plot(sub['month'], sub['flow'], ...)

You many run into some other challenges here (like finding a way to set nice styling for 30+ lines, and doing so in a loop). You can open a new post (building off of this one) if you can't find out how to accomplish something through other posts here. Best of luck!