enter image description here Hi,
I am trying to recreate some of the covid-19 charts that we have seen. I am using data from the Johns Hopkins database.
The data is arranged so that the city names are in the rows and the columns are dates. A screenshot of the csv file is attached. I want to plot line graphs in seaborn that has days in the x axis and confirmed case by city in the y axis. For some reason, I am unable to re-produce the exponential curves of the death rate.
My code is:
'''loading the file'''
date_columns = list(range(12,123))
df_covid_us = pd.read_csv(covid_us_file, parse_dates=date_columns)
df_covid_us = pd.read_csv(covid_us_file)
'''slicing the columns needed. Province_State and the date columns'''
df = df_covid_us.iloc[:, np.r_[6, 12:123]]
df = df[df['Province_State']=='New York']
'''using df.melt'''
df2 =df.melt(id_vars='Province_State',var_name='Date',value_name='Deaths')
'''plotting using seaborn'''[enter image description here][2]
sns.lineplot(x='Date',y='Deaths',data=df2, ci=None)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=20))
plt.show()
With a small sample of made-up data:
import pandas as pd, seaborn as sns
import matplotlib.pyplot as plt, matplotlib.dates as mdates
df = pd.DataFrame({'Province_State':['American Samoa','Guam','Puerto Rico'],
'2020-01-22':[0,1,2],
'2020-01-23':[2,1,0]})
# to get dates in rows
date_columns = [c for c in df.columns.tolist() if c.endswith('/2020')]
df2 = df.melt(id_vars='Province_State',value_vars=date_columns,
var_name='Date',value_name='Deaths')
# dates from string to datetime
df2['Date'] = pd.to_datetime(df2['Date'])
sns.lineplot(x='Date',y='Deaths',hue='Province_State',data=df2)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=1))
plt.show()