Search code examples
pythonpandasgraph

Add percentage of repitative strings in bar graph using Pandas


Let's say I've df like this

Reproducable:

import pandas as pd
import io   

TESTDATA="""All_services
All_services
Rehosting applications to AWS
Replacing flexible functionalities
Unaltered replatforming of underlying code structure, functionalities, features
Rebuilding broken applications/software segments
Optimize existing use of cloud(Cost saving)
Expand use of containers
Move on prem servers to Sass
Expanding public clouds
Implemenation CI/CD to clouds
Migration Evaluator
AWS Migration Hub 
AWS Application Discovery Services
AWS Landing Zone 
AWS Control Tower
AWS Management and Governance
AWS Database Migration Services
AWS Server Migration Service
AWS Database Migration Service
AWS Application Discovery Service
AWS Direct Connect
DB Migrations
open-source databases to AWS. 
Oracle to Oracle
Oracle or Microsoft SQL Server to Amazon Aurora. 
Migrating fileservers to Amazon S3
migrating commercial RDBMS or MySQL.
Optimize existing use of cloud(Cost saving)
Expand use of containers
Move on prem servers to Sass
Expanding public clouds
Implemenation CI/CD to clouds
Migration Evaluator
AWS Migration Hub 
AWS Application Discovery Services
AWS Landing Zone 
DB Migrations
Cloud Migration Planning                                              
Replatforming Applications for Cloud                                      
Cloud Application Development Services                                
From Monolith to Microservices                                                 
Cloud Infrastructure Automation                                                
Implemenation CI/CD to clouds
DB Migrations
Optimize existing use of cloud(Cost saving)
Implemenation CI/CD to clouds
Migration Evaluator
AWS Migration Hub 
AWS Application Discovery Services
AWS Direct Connect
DB Migrations
open-source databases to AWS. 
Oracle to Oracle
Oracle or Microsoft SQL Server to Amazon Aurora. 
Migrating fileservers to Amazon S3
Optimize existing use of cloud(Cost saving)
Amazon S3 Transfer Acceleration 
AWS Snowball
AWS Direct Connect
EC2
AWS Server Migration Service
AWS Database Migration Service 
VMWare Cloud on AWS 
Optimize existing use of cloud(Cost saving)
Cloud Application Development Services                                
From Monolith to Microservices                                                 
Cloud Infrastructure Automation                                                
Implemenation CI/CD to clouds
DB Migrations
Optimize existing use of cloud(Cost saving)
Implemenation CI/CD to clouds
Migration Evaluator
Optimize existing use of cloud(Cost saving)
AWS Application Discovery Services
AWS Direct Connect
DB Migrations
Rebuilding broken applications/software segments
Optimize existing use of cloud(Cost saving)
Expand use of containers
Move on prem servers to Sass
Expanding public clouds
Implemenation CI/CD to clouds
AWS Management and Governance
AWS Database Migration Services
AWS Server Migration Service
AWS Database Migration Service
AWS Application Discovery Service
AWS Direct Connect
DB Migrations
"""

df = pd.read_csv(io.StringIO(TESTDATA), sep=";")
df = df.replace(r"^ +| +$", r"", regex=True)
df.All_services.value_counts().sort_values().plot(kind = 'barh',figsize=(25, 15),linewidth=4)

I got graph like this

enter image description here

How can I add repitative string percentage to barplot using pandas??

enter image description here

There are similar answers but they are uisng matplotlib with pandas. I'm looking only with pandas with some preior hard coding. If it's not achivable I will go with matplotlib

Similar threads with matplotlib

pandas matplotlib labels bars as percentage

How to display percentage above grouped bar chart

Adding Percentage Labels to Grouper Bar Chart


Solution

  • It's not possible to do what you want only with Pandas. You have to use matplotlib:

    stats = (df['All_services'].value_counts(ascending=True).to_frame('count')
               .assign(pct=lambda x: x['count'].div(x['count'].sum()).mul(100)))
    
    ax = stats['count'].plot(kind='barh', figsize=(25, 15), linewidth=4)
    ax.bar_label(ax.containers[0], labels=stats['pct'].round(2).astype(str) + '%')
    
    plt.tight_layout()
    plt.show()
    

    Output:

    enter image description here