Search code examples
pythonpandasdataframematplotlibscatter

How to efficiently plot dates in matplotlib(Python)?


Here is a code that plots all the Covid-19 test cases in India from January,2021 to July. The records are on a daily basis but my scatterplot shows only 17 points and the dates are overlapped. Is there any efficient way to do this?

df.to_numpy()
df1 = df[(df.date.str.endswith("21")) & (df.location == "India")]
plt.scatter(x=df1.date, y=df1.total_cases)

Here is the graph:

enter image description here

Thanks a lot


Solution

  • Let's first create a toy dataframe:

    data = [[f"{i%30+1:02}/{6+i//30:02}/21", np.random.randint(i, i+10)**2] 
            for i in range(5, 50)]
    df = pd.DataFrame(data, columns=["date", "total_cases"])
    

    I recommend converting the dates to date format for easier processing in general and in particular for matplotlib to adapt its axes.

    df.date = pd.to_datetime(df.date, format="%d/%m/%y")
    

    Now matplotlib can process it and can choose the appropriate number of x-ticks. Since the labels are rather long, we can also rotate it.

    plt.plot(df.date, df.total_cases, '.')
    plt.xticks(rotation=25, ha="right")
    plt.tight_layout()
    plt.show()
    

    Version 3.4. of Matplotlib comes with a handy way of displaying dates on the x-axis:

    plt.rcParams['date.converter'] = 'concise'
    plt.plot(df.date, df.total_cases, '.')
    plt.tight_layout()
    plt.show()
    

    left: rotation; right: rcParams date.converter