Search code examples
pythonpandascsvsumplotly-python

How do you sum up similar data in a csv using plotly and pandas python libraries?


I'm trying to create a graph using pandas to read a csv file and plotly to create a bar chart in Python.

The csv data looks something like this (this isn't the correct data, its an example):

day,month,year,cases,deaths,countriesAndTerritories
11,11,2020,190,230, United_States_of_America
10,11,2020,224,132, United_States_of_America
9,11,2020,80,433, United_States_of_America
8,11,2020,126,623, United_States_of_America

I've successfully created a bar chart that visualizes deaths per month using this code:

import pandas as pd
import plotly.express as px

covid_data = pd.read_csv('data/data.csv')

united_states_data = covid_data[covid_data.countriesAndTerritories == 'United_States_of_America']

month_data = united_states_data[['month']]

death_data = united_states_data[['deaths']]

fig = px.bar(united_states_data, x='month', y='deaths', title='COVID-19 deaths by month')
fig.show()

enter image description here

The problem is for each month it stacks the data from each day on top of each other and shows white lines separating the days. I only want the data for each month, I don't care about the days. How would I go about this? I figured I would have to somehow create a new dataset for the total deaths of each month by adding the data from each of the days in the same months together?


Solution

  • The following code should achieve your purpose:

    df_plot = df.groupby('month', as_index=False).deaths.sum()
    fig = px.bar(df_plot, x='month', y='deaths', title='COVID-19 deaths by month')
    fig.show()