python pandas dataframe matplotlib scatter

How to efficiently plot dates in matplotlib(Python)?

Here is a code that plots all the Covid-19 test cases in India from January,2021 to July. The records are on a daily basis but my scatterplot shows only 17 points and the dates are overlapped. Is there any efficient way to do this?

df.to_numpy()
df1 = df[(df.date.str.endswith("21")) & (df.location == "India")]
plt.scatter(x=df1.date, y=df1.total_cases)

Here is the graph:

Thanks a lot

Solution

Let's first create a toy dataframe:

data = [[f"{i%30+1:02}/{6+i//30:02}/21", np.random.randint(i, i+10)**2] 
        for i in range(5, 50)]
df = pd.DataFrame(data, columns=["date", "total_cases"])

I recommend converting the dates to date format for easier processing in general and in particular for matplotlib to adapt its axes.

df.date = pd.to_datetime(df.date, format="%d/%m/%y")

Now matplotlib can process it and can choose the appropriate number of x-ticks. Since the labels are rather long, we can also rotate it.

plt.plot(df.date, df.total_cases, '.')
plt.xticks(rotation=25, ha="right")
plt.tight_layout()
plt.show()

Version 3.4. of Matplotlib comes with a handy way of displaying dates on the x-axis:

plt.rcParams['date.converter'] = 'concise'
plt.plot(df.date, df.total_cases, '.')
plt.tight_layout()
plt.show()