I am having difficulty creating a stacked bar chart time series from my Pandas dataframe (image below). I would like to have the 'Date' on the x axis, the 'Hours' on the y axis, and each bar to show the time spent with each group in 'Category'.
Do I need to use Pandas - Groupby function? The dataframe is a sample. I have hundreds of rows of data from 2018 to 2020.
pandas.DataFrame.groupby
on 'date'
and 'group'
, while aggregating .sum
on 'time'
.dt
extractor is used to extract only the .date
component of the 'date'
column.'Date'
column of your dataframe is properly formatted as a datetime
dtype
, with df.Date = pd.to_datetime(df.Date)
dfg
, must be shaped into the correct form, which can be accomplished with pandas.DataFrame.pivot
.pandas.DataFrame.plot.bar
and use the stacked
parameter.
pandas.DataFrame.plot
for all the parameters.import pandas as pd
import matplotlib.pyplot as plt
import random # for test data
import numpy as np # for test data
# setup dataframe with test data
np.random.seed(365)
random.seed(365)
rows = 1100
data = {'hours': np.random.randint(10, size=(rows)),
'group': [random.choice(['A', 'B', 'C']) for _ in range(rows)],
'date': pd.bdate_range('2020-11-24', freq='h', periods=rows).tolist()}
df = pd.DataFrame(data)
# display(df.head())
hours group date
0 2 C 2020-11-24 00:00:00
1 4 B 2020-11-24 01:00:00
2 1 C 2020-11-24 02:00:00
3 5 A 2020-11-24 03:00:00
4 2 B 2020-11-24 04:00:00
# use groupby on df
dfg = df.groupby([df.date.dt.date, 'group'])['hours'].sum().reset_index()
# pivot the dataframe into the correct format
dfp = dfg.pivot(index='date', columns='group', values='hours')
# display(dfp.head())
group A B C
date
2020-11-24 49 25 29
2020-11-25 62 18 57
2020-11-26 42 77 4
2020-11-27 34 43 17
2020-11-28 28 53 23
.pivot_table
, which both reshapes and aggregates
index=df.date.dt.date
is used so the index doesn't include the time component, since the data for the entire day is being aggregated.dfp = df.pivot_table(index=df.date.dt.date, columns='group', values='hours', aggfunc='sum')
# plot the pivoted dataframe
dfp.plot.bar(stacked=True, figsize=(10, 6), ylabel='Hours', xlabel='Date', title='Sum of Daily Category Hours')
plt.legend(title='Category', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
pandas.DataFrame.barh
dfp.plot.barh(stacked=True, figsize=(6, 10), title='Sum of Daily Category Hours')
plt.legend(title='Category', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xlabel('Hours')
plt.ylabel('Date')
plt.show()
dfp.plot(figsize=(10, 6))
plt.show()