Search code examples

Creating a matplotlib or seaborn histogram which uses percent rather than count?

Specifically I'm dealing with the Kaggle Titanic dataset. I've plotted a stacked histogram which shows ages that survived and died upon the titanic. Code below.

figure = plt.figure(figsize=(15,8))
plt.hist([data[data['Survived']==1]['Age'], data[data['Survived']==0]['Age']], stacked=True, bins=30, label=['Survived','Dead'])
plt.ylabel('Number of passengers')

I would like to alter the chart to show a single chart per bin of the percentage in that age group that survived. E.g. if a bin contained the ages between 10-20 years of age and 60% of people aboard the titanic in that age group survived, then the height would line up 60% along the y-axis.

Edit: I may have given a poor explanation to what I'm looking for. Rather than alter the y-axis values, I'm looking to change the actual shape of the bars based on the percentage that survived.

The first bin on the graph shows roughly 65% survived in that age group. I would like this bin to line up against the y-axis at 65%. The following bins look to be 90%, 50%, 10% respectively, and so on.

The graph would end up actually looking something like this:

enter image description here


  • Perhaps the following will help ...

    1. Split the dataframe based on 'Survived'

    2. Create Bins

    3. Use np.histogram to generate histogram data

    4. Calculate survival rate in each bin

    5. Plot[:-1],surv_rates,width=age_bins[1]-age_bins[0])
      plt.ylabel('Survival Rate')

    enter image description here