I want to find the total number of counts for each category. I generated an example using colors as the categories. I solved the problem how I wanted the solution to look; however, I feel like there should be a built-in command in a package which can do the same thing. My approach is slow and won't scale up. I don't mind optimizing this approach but would be best to look for a canned function.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Sum all values for each unique category.
def combine_like_categories(df):
col = df['color']
new_colors = col.unique()
new_values = np.zeros_like(new_colors)
new_df = pd.DataFrame(np.array([new_colors, new_values]).T)
headers=['color', 'value']
new_df.columns = headers
for _, row in df.iterrows():
new_df_index = new_df.loc[new_df['color']==row['color']].index[0]
new_df.iloc[new_df_index, 1] += row['value']
return new_df
data = [
['red', 3],
['blue', 2],
['green', 5],
['orange', 3],
['blue', 1],
['red', 7]
headers=['color', 'value']
df = pd.DataFrame(data)
df.columns = headers
hist_df = combine_like_categories(df)
plt.bar(hist_df['color'], hist_df['value'])
plt.title('Counts of Each Color')
Using groupby
ax = df.groupby('color')['value'].sum().plot.bar(rot=0)
ax.set_title('Counts of Each Color')