The left side of the figure represents the dataset. The right side of the figure represents the sketched bar that I want to produce. how can I plot these data to produce such sketched bar where X represents age groups and Y represents Length groups?
Any help would be great. Thanks in advance.
pd.cut()
and pandas.DataFrame.groupby()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# data
random.seed(365)
data = {'age': [np.random.randint(90) for _ in range(100)],
'length': [np.random.randint(30) for _ in range(100)]}
# dataframe
df = pd.DataFrame(data)
# age and days groups
df['age_group'] = pd.cut(df['age'], bins=[0, 50, 60, 70, 80, 1000], labels=['<50', '50-59', '60-69', '70-79', '≥80'])
df['days'] = pd.cut(df['length'], bins=[0, 6, 20, 1000], labels=['≤5 days', '6-20 days', '≥20 days'])
age length age_group days
72 22 70-79 ≥20 days
2 14 <50 6-20 days
14 12 <50 6-20 days
47 4 <50 ≤5 days
18 12 <50 6-20 days
# groupby plot
plt.figure(figsize=(16, 10))
df.groupby(['age_group', 'days'])['days'].count().unstack().apply(lambda x: x*100/sum(x), axis=1).plot.bar(stacked=True)
plt.legend(loc='center right', bbox_to_anchor=(-0.05, 0.5))
plt.xlabel('Age Groups')
plt.xticks(rotation=0)
plt.gca().set_yticklabels(['{:.0f}%'.format(x) for x in plt.gca().get_yticks()])
plt.show()
y-axis
from 0% - 100% while the other is normalized from 0 - 1.plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()])