Top K Frequent Elements - time complexity: Bucket Sort vs Heap

I was working on a leetcode problem (https://leetcode.com/problems/top-k-frequent-elements/) which is:

Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.

I solved this using min-heap (My time complexity calculations are in comment - do correct me if I did a mistake):

        from collections import Counter
        
        if k == len(nums):
            return nums
        
        # O(N)
        c = Counter(nums)
        
        it = iter([(x[1], x[0]) for x in c.items()])
        
        # O(K)
        result = list(islice(it, k))
        heapify(result)
        
        # O(N-K)
        for elem in it:
            # O(log K)
            heappushpop(result, elem)
            
        # O(K)
        return [pair[1] for pair in result]
    
    # O(K) + O(N) + O((N - K) log K) + O(K log K)
    # if k < N :
    #   O(N log K)

Then I saw a solution using Bucket Sort that suppose to beat the heap solution with O(N):

        bucket = [[] for _ in nums]

        # O(N)
        c = collections.Counter(nums)

        # O(d) where d is the number of distinct numbers. d <= N
        for num, freq in c.items():
            bucket[-freq].append(num)
            
        # O(?)
        return list(itertools.chain(*bucket))[:k]

How do we compute the time complexity of the itertools.chain call here? Is it come from the fact that at most we will chain N elements? Is that enough to deduce it is O(N)?

In any case, at least on leetcode test cases the first one has better performance - what can be the reason for that?

Solution

The time complexity of list(itertools.chain(*bucket)) is O(N) where N is the total number of elements in the nested list bucket. The chain function is roughly equivalent to this:

def chain(*iterables):
    for iterable in iterables:
        for item in iterable:
            yield item

The yield statement dominates the running time, is O(1), and executes N times, hence the result.

The reason your O(N log k) algorithm might end up being faster in practice is because log k is probably not very large; LeetCode says k is at most the number of distinct elements in the array, but I suspect for most of the test cases k is much smaller, and of course log k is smaller than that.

The O(N) algorithm probably has a relatively high constant factor because it allocates N lists and then randomly accesses them by index, resulting in a lot of cache misses; the append operations may also cause a lot of those lists to be reallocated as they become larger.