Search code examples
pythonpandasfor-loopbar-charttypeerror

Python: Iterating though dataframe columns as values in a function that prints charts


I'm trying to iterate through numeric fields in a data frame and create two separate bar charts one for Test1 and another for Test2 scores grouped by Name. I have a for loop that I get a type error on. I have a small sample of the data below but this for loop would run for data frame larger than 25 fields. Below is my code and error:

import pandas as pd
import matplotlib.pyplot as plt

data = {'Name': ['Tom', 'Joseph', 'Krish', 'John', 'Tom', 'Joseph', 'Krish', 'John'],
                   'Test1': [20, 21, 19, 18, 30, 33, 12, 10],
                   'Test2': [78, 89, 77, 91, 95, 90, 87, 70]}

df = pd.DataFrame(data)

for columns in df.columns[1:]:
    data = df[(df.columns > 80 )].groupby(
        df.Name, as_index = True).agg(
        {columns: "sum"})
    fig, (ax) = plt.subplots( figsize = (24,7))

    data.plot(kind = 'bar', stacked = False,
                  ax = ax)

TypeError: '>' not supported between instances of 'str' and 'int'


Solution

  • Your program was having an issue with attempting to compare the data in the "Name" column with the integer value that you had in the variable definition line before it would move along to the other two columns.

    data = df[(df.columns > 80 )].groupby(df.Name, as_index = True).agg({columns: "sum"})
    

    The values in that column are strings which makes the function fail. Through some trial and error, I revised your program to just perform comparisons on columns two and three ("Test1" and "Test2"). Following is the revised code.

    import pandas as pd
    import matplotlib.pyplot as plt
    
    data = {'Name': ['Tom', 'Joseph', 'Krish', 'John', 'Tom', 'Joseph', 'Krish', 'John'],
                       'Test1': [20, 21, 19, 18, 30, 33, 12, 10],
                       'Test2': [78, 89, 77, 91, 95, 90, 87, 70]}
    
    df = pd.DataFrame(data)
    for columns in df.columns[1:]:
        data = df[(df['Test1'] > 20) | (df['Test2'] > 80)].groupby(df.Name, as_index = True).agg({columns: "sum"})
        fig, (ax) = plt.subplots( figsize = (24,7))
        
        data.plot(kind = 'bar', stacked = False, ax = ax)
    
    plt.show()
    

    Running that program produced the two bar charts.

    Sample Bar Charts

    You might want to experiment with the comparison values, but I think this should provide you with the information to move forward on your program.

    Hope that helped.

    Regards.