I'm trying to create a graph using pandas to read a csv file and plotly to create a bar chart in Python.
The csv data looks something like this (this isn't the correct data, its an example):
day,month,year,cases,deaths,countriesAndTerritories
11,11,2020,190,230, United_States_of_America
10,11,2020,224,132, United_States_of_America
9,11,2020,80,433, United_States_of_America
8,11,2020,126,623, United_States_of_America
I've successfully created a bar chart that visualizes deaths per month using this code:
import pandas as pd
import plotly.express as px
covid_data = pd.read_csv('data/data.csv')
united_states_data = covid_data[covid_data.countriesAndTerritories == 'United_States_of_America']
month_data = united_states_data[['month']]
death_data = united_states_data[['deaths']]
fig = px.bar(united_states_data, x='month', y='deaths', title='COVID-19 deaths by month')
fig.show()
The problem is for each month it stacks the data from each day on top of each other and shows white lines separating the days. I only want the data for each month, I don't care about the days. How would I go about this? I figured I would have to somehow create a new dataset for the total deaths of each month by adding the data from each of the days in the same months together?
The following code should achieve your purpose:
df_plot = df.groupby('month', as_index=False).deaths.sum()
fig = px.bar(df_plot, x='month', y='deaths', title='COVID-19 deaths by month')
fig.show()