I am seeking some guidance on how to best automate a loop to create graphical visualization of some calculations made from dict.
I've pieced together the following code to create a single graph, but need to produce many similar graphs (using different variables) and would rather not type out each variable multiple times (there will be 100s of variables).
For a single graph, I have the following code (see below) where Calclist is a dict and variable1 is a specific column within that dict:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Sets = {}
labels = ['0 - 5','5 - 25','25 - 50','50 - 75','75 - 100']
blocks = [-1,5,25,50,75,100]
for i in Calclist:
out = pd.cut(Calclist[i]['variable1'], bins = blocks)
Sets[i] = (pd.value_counts(out)/Calclist[i]['variable1'].count())*100
df = pd.DataFrame(Sets)
df.reset_index(level=0, inplace=True)
df.rename(index = str, columns = {'index':'blocks'}, inplace=True)
ax = df.plot.bar(title='One iteration - works well')
ax.set_xlabel("x-axis label")
ax.set_ylabel("y-axis label")
ax.set_xticklabels(labels, rotation=45)
So far, so good - this is what the code will produce:
What I would really like to do is iterate through variable1 (to variable2, variable3, .....).
I have tried a couple of things, and think I'm close but likely missing something fundamental.
Specifically, I tried nesting another loop that iterates over a series using "Parameter", which holds the variable names that I'm interested in visualizing:
Sets = {}
labels = ['0 - 5','5 - 25','25 - 50','50 - 75','75 - 100']
blocks = [-1,5,25,50,75,100]
Parameter = pd.Series("variable1","variable2")
for j in Parameter:
for i in Calclist:
out = pd.cut(Calclist[i][Parameter[j]], bins = blocks)
Sets[i] = (pd.value_counts(out)/Calclist[i]
[Parameter[j]].count())*100
but I get the following error:
TypeError: Index(...) must be called with a collection of some kind,
'powertotal_total' was passed
Any and all suggestions are greatly appreciated.
Your error message is caused by the way you are initializing Parameters
:
parameter = pd.Series('variable1', 'variable2')
...
TypeError: Index(...) must be called with a collection of some kind,
'variable2' was passed
You must pass array-like, dict, or scalar value to pd.Series
. Also, iterating over Parameters
will return its values. Finally, you should initialize Sets
for each parameter
:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
labels = ['0 - 5','5 - 25','25 - 50','50 - 75','75 - 100']
blocks = [-1,5,25,50,75,100]
Parameters = pd.Series(['variable1', 'variable2'])
for parameter in Parameters:
Sets = {}
for i in Calclist:
out = pd.cut(Calclist[i][parameter], bins = blocks)
Sets[i] = (pd.value_counts(out)/Calclist[i][parameter].count())*100
df = pd.DataFrame(Sets)
df.reset_index(level=0, inplace=True)
df.rename(index=str, columns={'index': 'blocks'}, inplace=True)
ax = df.plot.bar(title=parameter)
ax.set_xlabel("x-axis label")
ax.set_ylabel("y-axis label")
ax.set_xticklabels(labels, rotation=45)