Search code examples
pythonpandasplothistogramstacked

Python Stacked Histogram


Suppose I have this pandas dataframe,

    pC  Truth
0   0.601972    0
1   0.583300    0
2   0.595181    1
3   0.418910    1
4   0.691974    1

'pC' is the probability of 'Truth' being 1. 'Truth' is binary value. I want to create histogram of the probability, and inside of each bin will be the proportion 0 vs proportion 1.

I tried the following,

df[['pC','Truth']].plot(kind='hist',stacked=True)

It just put 'Truth' value between 0 and 1.

Reproducible:

shape = 1000
df_t = pd.DataFrame({'pC': np.random.rand(shape),
                     'Truth':np.random.choice([0,1],size=shape)})
df_t['factor'] = pd.cut(df_t.pC,5)

How do I do this? Thanks


Solution

  • Solved this with,

    shape = 1000
    df_t = pd.DataFrame({'pC': np.random.rand(shape),
                         'Truth':np.random.choice([0,1],size=shape)})
    df_t['factor'] = pd.cut(df_t.pC,5)
    df_p = (df_t[['factor','Truth']]
            .pivot_table(columns='Truth',index='factor',aggfunc=len,fill_value=0)
            .reset_index())
    df_p[['factor',0,1]].plot(kind='bar',stacked=True,x='factor');