Search code examples
pythonpandassumpercentageseries

pandas: how to get the percentage for each row


When I use pandas value_count method, I get the data below:

new_df['mark'].value_counts()

1   1349110
2   1606640
3   175629
4   790062
5   330978

How can I get the percentage for each row like this?

1   1349110 31.7%
2   1606640 37.8%
3   175629  4.1%
4   790062  18.6%
5   330978  7.8%

I need to divide each row by the sum of these data.


Solution

  • np.random.seed([3,1415])
    s = pd.Series(np.random.choice(list('ABCDEFGHIJ'), 1000, p=np.arange(1, 11) / 55.))
    
    s.value_counts()
    
    I    176
    J    167
    H    136
    F    128
    G    111
    E     85
    D     83
    C     52
    B     38
    A     24
    dtype: int64
    

    As percent

    s.value_counts(normalize=True)
    
    I    0.176
    J    0.167
    H    0.136
    F    0.128
    G    0.111
    E    0.085
    D    0.083
    C    0.052
    B    0.038
    A    0.024
    dtype: float64
    

    counts = s.value_counts()
    percent = counts / counts.sum()
    fmt = '{:.1%}'.format
    pd.DataFrame({'counts': counts, 'per': percent.map(fmt)})
    
       counts    per
    I     176  17.6%
    J     167  16.7%
    H     136  13.6%
    F     128  12.8%
    G     111  11.1%
    E      85   8.5%
    D      83   8.3%
    C      52   5.2%
    B      38   3.8%
    A      24   2.4%