Search code examples
pythonloopspandasgraphing

Looping for creating graphs


I am seeking some guidance on how to best automate a loop to create graphical visualization of some calculations made from dict.

I've pieced together the following code to create a single graph, but need to produce many similar graphs (using different variables) and would rather not type out each variable multiple times (there will be 100s of variables).

For a single graph, I have the following code (see below) where Calclist is a dict and variable1 is a specific column within that dict:

 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt

 Sets = {}
 labels = ['0 - 5','5 - 25','25 - 50','50 - 75','75 - 100']
 blocks = [-1,5,25,50,75,100]

 for i in Calclist:

     out = pd.cut(Calclist[i]['variable1'], bins = blocks)
     Sets[i] = (pd.value_counts(out)/Calclist[i]['variable1'].count())*100

 df = pd.DataFrame(Sets)
 df.reset_index(level=0, inplace=True)
 df.rename(index = str, columns = {'index':'blocks'}, inplace=True)

 ax = df.plot.bar(title='One iteration - works well')
 ax.set_xlabel("x-axis label")
 ax.set_ylabel("y-axis label")
 ax.set_xticklabels(labels, rotation=45)

So far, so good - this is what the code will produce:

Single iteration

What I would really like to do is iterate through variable1 (to variable2, variable3, .....).

I have tried a couple of things, and think I'm close but likely missing something fundamental.

Specifically, I tried nesting another loop that iterates over a series using "Parameter", which holds the variable names that I'm interested in visualizing:

 Sets = {}
 labels = ['0 - 5','5 - 25','25 - 50','50 - 75','75 - 100']    
 blocks = [-1,5,25,50,75,100]                                

 Parameter = pd.Series("variable1","variable2")

 for j in Parameter:

     for i in Calclist:

         out = pd.cut(Calclist[i][Parameter[j]], bins = blocks)
         Sets[i] = (pd.value_counts(out)/Calclist[i]
         [Parameter[j]].count())*100

but I get the following error:

 TypeError: Index(...) must be called with a collection of some kind, 
 'powertotal_total' was passed

Any and all suggestions are greatly appreciated.


Solution

  • Your error message is caused by the way you are initializing Parameters:

    parameter = pd.Series('variable1', 'variable2')
    ...
    TypeError: Index(...) must be called with a collection of some kind,
    'variable2' was passed
    

    You must pass array-like, dict, or scalar value to pd.Series. Also, iterating over Parameters will return its values. Finally, you should initialize Sets for each parameter:

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    labels = ['0 - 5','5 - 25','25 - 50','50 - 75','75 - 100']
    blocks = [-1,5,25,50,75,100]
    
    Parameters = pd.Series(['variable1', 'variable2'])
    
    for parameter in Parameters:
        Sets = {}
        for i in Calclist:
            out = pd.cut(Calclist[i][parameter], bins = blocks)
            Sets[i] = (pd.value_counts(out)/Calclist[i][parameter].count())*100
    
        df = pd.DataFrame(Sets)
        df.reset_index(level=0, inplace=True)
        df.rename(index=str, columns={'index': 'blocks'}, inplace=True)
    
        ax = df.plot.bar(title=parameter)
        ax.set_xlabel("x-axis label")
        ax.set_ylabel("y-axis label")
        ax.set_xticklabels(labels, rotation=45)