Search code examples
pythonpandasmatplotlibdata-visualizationdata-analysis

How to plot bar chart to compare multiple systems with multiple variables using Pandas in Python


I'm doing some basic data analysis with Pandas and am having trouble with plotting data. I have data for multiple systems where each system has rank positions (1-10). Within each rank position there are grades A, C, and F, with a percentage. I'd like to have a graph for each system, where the x-axis contains the ranks and the y-axis contains the grade percentages. Here is an example of my data:

{
  "System1": {
      "1": {
             "A": 0.5,
             "C": 0.3,
             "F": 0.1
           },
      "2": {
             "A": 0.3,
             "C": 0.3,
             "F": 0.4
           },
      ...,
      "10": {
              "A": 0.1,
              "C": 0.3,
              "F": 0.6
            }
   },
   "System2": {
       "1": {
              ...
            },
       ...,
       "10": {
              ...
        }
   }
}

I would like to produce a graph that looks like this: enter image description here

I have loaded my data into a dataframe using pd.DataFrame.from_dict(ranked_grades) but I'm having trouble with getting Pandas to work with my data's nested structure. My dataframe looks like this once loaded:

                                              System1                                           System2                                
1   {'C': 0.35377358490566035, 'F': 0.132075471698...  {'C': 0.3696682464454976, 'F': 0.1611374407582...  
2   {'C': 0.33490566037735847, 'F': 0.372641509433...  {'C': 0.3459715639810427, 'F': 0.2890995260663...  
3   {'C': 0.330188679245283, 'F': 0.41037735849056...  {'C': 0.3080568720379147, 'F': 0.4502369668246...  
4   {'C': 0.2783018867924528, 'F': 0.5235849056603...  {'C': 0.3175355450236967, 'F': 0.4739336492890... 
...
10  {'C': 0.2830188679245283, 'F': 0.5943396226415...  {'C': 0.24170616113744076, 'F': 0.630331753554... 

Solution

  • I'm learning a ton of stuff here. I may update this answer if I find more.

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    d = {
        k0: {
            k1: {
                k2: np.random.randint(0, 10) / 10 for k2 in list('ACF')
            } for k1 in range(1, 11)
        } for k0 in ['System1', 'System2']
    }
    
    df = pd.Panel(d).to_frame().rename_axis([None, None]).T.stack()
    fig, axes = plt.subplots(2, 1, figsize=(6, 4), sharex=True)
    for i, (name, group) in enumerate(df.groupby(level=0)):
        group.xs(name).sort_index().plot.bar(ax=axes[i], ylim=[0, 1])
        axes[i].set_title(name, rotation=270, position=(1.05, .55),
                          backgroundcolor='gray')
    
    axes[0].legend(bbox_to_anchor=(1.1, .2), loc=2, borderaxespad=0.)
    axes[1].legend().remove()
    
    plt.subplots_adjust(hspace=0.1)
    

    enter image description here