Search code examples
pythonlistdictionaryfor-loop

Python function to find the highest frequency number


I'm working on a Python function that is supposed to return the number with the highest frequency in an array. In cases where two numbers or more have the same frequency, it should return the larger number.

However, I'm facing an issue with my current implementation.

def highest_rank(arr):
    count_num = {}
    for i in arr:
        if i not in count_num:
            count_num[i] = 0
        else:
            count_num[i] = arr.count(i)
    return max(count_num, key=count_num.get)

In the this example [9, 48, 1, 8, 44, 45, 32] , I expect the function to return 48 but it return 9


Solution

  • This is a very fast solution that finds the correct result (as tested with a few different inputs)

    from collections import defaultdict
    
    def highest_rank(arr):
        count = defaultdict(int)
        highest_rank = 0
        for num in arr:
            cnt = count[num]+1
            count[num]=cnt
            highest_rank_cnt = count[highest_rank]
            if cnt > highest_rank_cnt or (cnt == highest_rank_cnt and num > highest_rank):
                highest_rank = num
        return highest_rank
                
    print(highest_rank([9, 48, 1, 8, 44, 45, 32])) # Prints 48 as expected
    

    A quick comparison between my method and your solution (applied the fix from accepted solution):

    import numpy as np
    from collections import defaultdict
    
    def highest_rank(arr):
        count = defaultdict(int)
        highest_rank = 0
        highest_rank_cnt = 0
        for num in arr:
            cnt = count[num]+1
            count[num]=cnt
            if cnt > highest_rank_cnt or (cnt == highest_rank_cnt and num > highest_rank):
                highest_rank = num
                highest_rank_cnt = cnt
        return highest_rank
                
    def highest_rank_slow(arr):
        count_num = {}
        for i in arr:
            if i not in count_num:
                count_num[i] = 0
            else:
                count_num[i] = arr.count(i)
        return max(count_num,key=lambda x:(count_num.get(x),x))
    
    nums = list(np.random.randint(0,1000,10_000))
    %timeit highest_rank(nums)          # 2.97 ms ± 393 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    %timeit highest_rank_slow(nums)     # 1.56 s ± 52.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    In fact, my approach can do this for a million numbers in roughy 230ms which is still faster than your solution for 10K numbers since .count() is REPEATEDLY looping through the list instead of counting in one go.

    Edit:

    If you don't want to use the built-in defaultdict, you can simply use your if num not in count trick:

    def highest_rank_no_defaultdict(arr):
        count = {}
        highest_rank = 0
        highest_rank_cnt = 0
        for num in arr:
            if num not in count:
                cnt=1
            else:
                cnt = count[num]+1
            count[num]=cnt
            if cnt > highest_rank_cnt or (cnt == highest_rank_cnt and num > highest_rank):
                highest_rank = num
                highest_rank_cnt = cnt
        return highest_rank
    

    this is both more verbose and a bit slower than using defaultdict so unless you absolutely need to, stick with defaultdict.