Search code examples
pythonbar-chartcategoriestraining-datatest-data

Plot a Data Set According to Counts of Categories of a Variable


I have a dataset which has 14 columns (I had to only use 4 columns: travelling class, gender, age, and fare price) that I have split into train and test data sets. I need to create a vertical bar chart from the train data set for the distribution of the passengers by travelling class (1, 2, and 3 are the classes). I am not allowed to use NumPy, Pandas, SciPy, and SciKit-Learn.

I am very new to Python, and I know how to plot very simple graphs, but when it comes to more complicated graphs, I get a bit lost.

This is my code (I know there is a lot wrong):

travelling_class = defaultdict(list)
for row in data:
    travelling_class[row[0]]

travelling_class = {key: len(val) for key, val in travelling_class.items()}

keys = travelling_class()
vals = [travelling_class[key] for key in keys]
ind  = range(min(travelling_class.keys()), max(travelling_class.keys()) + 1)
width = 0.6

plt.xticks([i + width/2 for i in ind], ind, ha='center')
plt.xlabel('Tracelling Class') 
plt.ylabel('Counts of Passengers')
plt.title('Number of Passengers per Travelling Class')
plt.ylim(0, 1000)
plt.bar(keys, vals, width)
plt.show()

import matplotlib.pyplot as plt

classes = travelling_class[1, 2, 3]

plt.hist(classes)
plt.show()

@TrakJohnson This is the original asker of the question - sorry I accidentally somehow deleted my profile so had to make a new one. Thank you so much for your help. The problem is that my data set is 1045 rows, so it might be difficult to list all of them. Does the above seem reasonable?


Solution

  • Use plt.hist, which will plot a histogram (more info here)

    Example:

    import matplotlib.pyplot as plt
    
    classes = [1, 2, 1, 1, 3, 3]
    
    plt.hist(classes)
    plt.show()
    

    And this is the result:

    Histogram