Search code examples
pythonpandasdata-analysis

How to iterate over numpy.ndarray which consists of objects? Also apply various functions on them


I have a dataframe of 100000+ rows in which I have a column name 'type' which as unique values like: ['healer' 'terminator' 'kill-la-kill' 'demonic' 'healer-fpp' 'terminator-fpp' 'kill-la-kill-fpp' 'demonic-fpp']

What I want is to count the number of each type in the dataframe. What I am doing now to count the row is: len(df.loc[df['type'] == "healer"])

But in this case I have to write it manually as many times as there are unique values in that column. Is there any other simpler way to do that? Also I want to use these condition to filter out other columns as well like the 'terminator' killed 78 in the 'kills' and had '0' heals


Solution

  • Numpy is great, and usually have already a one-liner that covers most requirements like this - I think what you might want is...

    np.unique(yourArray,  return_counts=True)
    

    Which will return a list of unique values, and the number of times each one appears in your array.

    try:

    import numpy as np
    np.unique(df['type'].values, return_counts=True)
    

    Or, roll it up in a dict, so you can extract the counts keyed by value:

    count_dict = dict(zip(*np.unique(df['type'].values, return_counts=True)))
    count_dict["healer"]
    
    >> 132
    

    Then you can plug that into a format string and (assuming you make a similar dictionary called heals_dict) do something like:

    for k in count_dict.keys():
        print ( "the {k} killed {kills} in the 'kills' and had {heals} heals".format(k=k, kills=count_dict[k], heals=heals_dict[k]) )