Search code examples
pythonpandasstatistics

How do I plot percentile graph with interval data


How do I plot percentile graph with interval data?

See the code below to calculate percentiles of data based on specific intervals.

idx = pd.IntervalIndex.from_breaks([39.9, 42.9,45.9,48.9,51.9,54.9,57.9])
df = pd.DataFrame({"Bin": idx, "Frequency": [2,2,5,5,12,3]})
n = df["Frequency"].sum()
df['cumulativeSumFreq'] = df["Frequency"].cumsum()
df['cumulativePercent'] = (df["Frequency"]/n)*100
df

bins = [39.9, 42.9,45.9,48.9,51.9,54.9,57.9]
df.hist(column='cumulativePercent', bins=bins)
plt.show()

For some reason df.hist() does not except bins=idx

I get the following plot below which does not follow the correct binning. How would I achieve this?

enter image description here

enter image description here


Solution

  • Use pyplot.stairs or Axes.stairs:

    edges = [39.9, 42.9,45.9,48.9,51.9,54.9,57.9]
    plt.stairs(df['cumulativePercent'], edges, fill=True)
    

    or

    edges = [39.9, 42.9,45.9,48.9,51.9,54.9,57.9]
    fig, ax = plt.subplots()
    ax.stairs(df['cumulativePercent'], edges, fill=True)
    

    Output:

    enter image description here