Search code examples
pythonpandasmatplotliblegendtwinx

How to line plot multiple columns with nan and on twinx / secondary_y


The graph is fixed now but I am having troubles plotting the legend. It only shows legend for 1 of the plots. As seen in the picture below

I am trying to plot a double axis graph with twinx but I am facing some difficulties as seen in the picture below.

Any input is welcomed! If you require any additional information, I am happy to provide them to you.

enter image description here

as compared to the original before plotting z-axis.

enter image description here

I am unsure why my graph is like that as initially before plotting my secondary y axis, (the pink line), the closing value graph can be seen perfectly but now it seems cut.

It may be due to my data as provided below.

Code I have currently:

# read csv into variable
sg_df_merged = pd.read_csv("testing1.csv", parse_dates=[0], index_col=0)

# define figure
fig = plt.figure()

fig, ax5 = plt.subplots()
ax6 = ax5.twinx()

x = sg_df_merged.index
y = sg_df_merged["Adj Close"]
z = sg_df_merged["Singapore"]

curve1 = ax5.plot(x, y, label="Singapore", color = "c")
curve2 = ax6.plot(x, z, label = "Face Mask Compliance", color = "m")
curves = [curve1, curve2]

# labels for my axis
ax5.set_xlabel("Year")
ax5.set_ylabel("Adjusted Closing Value ($)")
ax6.set_ylabel("% compliance to wearing face mask")
ax5.grid #not sure what this line does actually

# set x-axis values to 45 degree angle
for label in ax5.xaxis.get_ticklabels():
    label.set_rotation(45)
ax5.grid(True, color = "k", linestyle = "-", linewidth = 0.3)

plt.gca().legend(loc='center left', bbox_to_anchor=(1.1, 0.5), title = "Country Index")
plt.show(); 

Initially, I thought it was due to my excel having entire blank lines, but I have since removed those rows. The sample data is in this question.

Also, I have tried to interpolate but somehow it doesn't work.


Solution

    • Only rows that where all NaN, were dropped. There’s still a lot of rows with NaN.
    • In order for matplotlib to draw connecting lines between two data points, the points must be consecutive.
    • The plot API isn't connecting the data between the NaN values
    • This can be dealt with by converting the pandas.Series to a DataFrame, and using .dropna.
    • See that x has been dropped, because it will not match the index length of y or z. They are shorter after .dropna.
    • y is now a separate dataframe, where .dropna is used.
    • z is also a separate dataframe, where .dropna is used.
    • The x-axis for the plot are the respective indices.
    • Tested in python v3.12.0, pandas v2.1.2, matplotlib v3.8.1.
    import pandas as pd
    import matplotlib.pyplot as plt
    import matplotlib.dates as mdates
    
    # read csv into variable
    sg_df_merged = pd.read_csv("test.csv", parse_dates=[0], index_col=0)
    
    # define figure
    fig, ax5 = plt.subplots(figsize=(8, 6))
    ax6 = ax5.twinx()
    
    # select specific columns to plot and drop additional NaN
    y = pd.DataFrame(sg_df_merged["Adj Close"]).dropna()
    z = pd.DataFrame(sg_df_merged["Singapore"]).dropna()
    
    # add plots with markers
    curve1 = ax5.plot(y.index, 'Adj Close', data=y, label="Singapore", color = "c", marker='o')
    curve2 = ax6.plot(z.index, 'Singapore', data=z, label = "Face Mask Compliance", color = "m", marker='o')
    
    # labels for my axis
    ax5.set_xlabel("Year")
    ax5.set_ylabel("Adjusted Closing Value ($)")
    ax6.set_ylabel("% compliance to wearing face mask")
        
    # rotate xticks
    ax5.xaxis.set_tick_params(rotation=45)
    
    # add a grid to ax5
    ax5.grid(True, color = "k", linestyle = "-", linewidth = 0.3)
    
    # create a legend for both axes
    curves = curve1 + curve2
    labels = [l.get_label() for l in curves]
    ax5.legend(curves, labels, loc='center left', bbox_to_anchor=(1.1, 0.5), title = "Country Index")
    
    plt.show()
    

    enter image description here


    # given a datetime[ns] dtype index, if the time components are all 0, extracting only the date will cause the xticklabels to be centered under the tick
    df.index = df.index.date
    
    ax = df['Adj Close'].dropna().plot(marker='.', color='c', grid=True, figsize=(12, 6),
                                       title='My Plot', ylabel='Adj Close', xlabel='Date', legend='Adj Close')
    ax_right = df['Singapore'].dropna().plot(marker='.', color='m', secondary_y=True, legend='Singapore', rot=0, ax=ax)
    
    ax_right.set_ylabel('Singapore')
    
    ax.legend(title='Country Index', bbox_to_anchor=(1.06, 0.5), loc='center left', frameon=False)
    ax_right.legend(bbox_to_anchor=(1.06, 0.43), loc='center left', frameon=False)
    
    ax.xaxis.set_major_locator(mdates.MonthLocator(bymonth=(1, 7)))
    ax.xaxis.set_minor_locator(mdates.MonthLocator())
    

    enter image description here


    Data

    • Copy the data to the clipboard and read with the following line.
    df = pd.read_clipboard(sep=',', index_col=[0], parse_dates=[0]) 
    
    ,Adj Close,Singapore
    2015-10-01,2998.350098,
    2015-11-01,2855.939941,
    2015-12-01,2882.72998,
    2016-01-01,2629.110107,
    2016-02-01,2666.51001,
    2016-03-01,2840.899902,
    2016-04-01,2838.52002,
    2016-05-01,2791.060059,
    2016-06-01,2840.929932,
    2016-07-01,2868.689941,
    2016-08-01,2820.590088,
    2016-09-01,2869.469971,
    2016-10-01,2813.8701170000004,
    2016-11-01,2905.169922,
    2016-12-01,2880.76001,
    2017-01-01,3046.800049,
    2017-02-01,3096.610107,
    2017-03-01,3175.110107,
    2017-04-01,3175.439941,
    2017-05-01,3210.820068,
    2017-06-01,3226.47998,
    2017-07-01,3329.52002,
    2017-08-01,3277.26001,
    2017-09-01,3219.909912,
    2017-10-01,3374.080078,
    2017-11-01,3433.540039,
    2017-12-01,3402.919922,
    2018-01-01,3533.98999,
    2018-02-01,3517.939941,
    2018-03-01,3427.969971,
    2018-04-01,3613.929932,
    2018-05-01,3428.179932,
    2018-06-01,3268.699951,
    2018-07-01,3319.850098,
    2018-08-01,3213.47998,
    2018-09-01,3257.050049,
    2018-10-01,3018.800049,
    2018-11-01,3117.610107,
    2018-12-01,3068.76001,
    2019-01-01,3190.169922,
    2019-02-01,3212.689941,
    2019-03-01,3212.879883,
    2019-04-01,3400.199951,
    2019-05-01,3117.76001,
    2019-06-01,3321.610107,
    2019-07-01,3300.75,
    2019-08-01,3106.52002,
    2019-09-01,3119.98999,
    2019-10-01,3229.879883,
    2019-11-01,3193.919922,
    2019-12-01,3222.830078,
    2020-01-01,3153.72998,
    2020-02-01,3011.080078,
    2020-02-21,,24.0
    2020-02-25,,
    2020-02-28,,22.0
    2020-03-01,2481.22998,
    2020-03-02,,
    2020-03-03,,
    2020-03-06,,23.0
    2020-03-10,,
    2020-03-13,,21.0
    2020-03-17,,
    2020-03-20,,24.0
    2020-03-23,,
    2020-03-24,,
    2020-03-27,,27.0
    2020-03-30,,
    2020-03-31,,
    2020-04-01,2624.22998,
    2020-04-03,,37.0
    2020-04-06,,
    2020-04-07,,
    2020-04-10,,73.0
    2020-04-13,,
    2020-04-14,,
    2020-04-17,,85.0
    2020-04-20,,
    2020-04-21,,
    2020-04-24,,90.0
    2020-04-27,,
    2020-04-28,,
    2020-05-01,2510.75,90.0
    2020-05-05,,
    2020-05-15,,
    2020-05-21,,
    2020-05-22,,92.0
    2020-05-25,,
    2020-05-26,,
    2020-05-30,,
    2020-06-01,2589.909912,
    2020-06-05,,89.0
    2020-06-08,,
    2020-06-15,,
    2020-06-16,,
    2020-06-19,,92.0
    2020-06-22,,
    2020-06-25,,
    2020-07-01,2529.820068,
    2020-07-03,,
    2020-07-06,,
    2020-07-07,,90.0
    2020-07-12,,
    2020-07-14,,
    2020-07-20,,92.0
    2020-07-26,,
    2020-07-27,,
    2020-07-31,,
    2020-08-01,2532.51001,
    2020-08-03,,88.0
    2020-08-07,,
    2020-08-10,,
    2020-08-12,,
    2020-08-14,,90.0
    2020-08-17,,
    2020-08-25,,
    2020-08-28,,90.0
    2020-08-31,,
    2020-09-01,2490.090088,
    2020-09-11,2490.090088,