I have a dataframe of 100000+ rows
in which I have a column name 'type'
which as unique values like:
['healer' 'terminator' 'kill-la-kill' 'demonic' 'healer-fpp' 'terminator-fpp' 'kill-la-kill-fpp' 'demonic-fpp']
What I want is to count the number of each type in the dataframe. What I am doing now to count the row is:
len(df.loc[df['type'] == "healer"])
But in this case I have to write it manually as many times as there are unique values in that column.
Is there any other simpler way to do that?
Also I want to use these condition to filter out other columns as well
like the 'terminator' killed 78 in the 'kills' and had '0' heals
Numpy is great, and usually have already a one-liner that covers most requirements like this - I think what you might want is...
np.unique(yourArray, return_counts=True)
Which will return a list of unique values, and the number of times each one appears in your array.
try:
import numpy as np
np.unique(df['type'].values, return_counts=True)
Or, roll it up in a dict, so you can extract the counts keyed by value:
count_dict = dict(zip(*np.unique(df['type'].values, return_counts=True)))
count_dict["healer"]
>> 132
Then you can plug that into a format string and (assuming you make a similar dictionary called heals_dict
) do something like:
for k in count_dict.keys():
print ( "the {k} killed {kills} in the 'kills' and had {heals} heals".format(k=k, kills=count_dict[k], heals=heals_dict[k]) )