Search code examples
python-3.xalgorithmsortingquicksortquickselect

Why this quick select algorithm doesn't work always


I have a simple quick select algorithm and want to understand why it doesn't work sometimes. The question is to find Top K Frequent Elements. I know there are other ways to do it such as using heap or bucket sorting. But I am curios why my algorithm doesn't work as I know there are other ways to do quick select which work. It sometimes doesn't work due to the random.choice function which means any of the following can happen:

  1. get the correct response
  2. get the incorrect response
  3. face maximum recursion depth exceeded error

Why? Isn't quick select based on random function, how come other quick select algorithms that use Python functions for randomness work but this one doesn't. what do I miss?

inputs are a list of numbers as nums and an integer k

 def topKFrequent(self, nums: List[int], k: int) -> List[int]:
    histogram = Counter(nums) """ nums' frequency histogram/map"""
    def quickSelect(keys, k, ri = []):  
        r = random.choice(keys)  """ find the random key""" 
        pivot = histogram[r]  """ get the random frequency based on the random key"""
        left, right = [], [] """ put the keys in either right or left list based on their values"""
        for key in keys:
            if histogram[key] >= pivot:
                right.append(key)
            else:
                left.append(key) 
        if k == len(right) + len(ri): """ if the size of right plus previous right (ri) is equal to k we found it"""
            return right + ri
        elif k < len(right) + len(ri):
            quickSelect(right, k, ri)         
        else:
            quickSelect(left, k - len(right) - len(ri), right+ri)

    return quickSelect(list(histogram.keys()), k, [])

Update

it works per the accepted response suggestion. Added other way of solving it as comments as well for those who are interested.

    def topKFrequent(self, nums: List[int], k: int) -> List[int]:

    ####heap in O(nlogk)
    # histogram = Counter(nums)
    # return heapq.nlargest(k, histogram.keys(), key = histogram.get)

    ####bucket sort in O(n)
    # bucket = [[] for _ in range(len(nums)+1)]
    # histogram = Counter(nums)
    # for n, f in histogram.items():
    #     bucket[f].append(n)
    # res = []
    # for i in range(len(bucket)-1, -1, -1):
    #     if bucket[i]:
    #         for n in bucket[i]:
    #             res.append(n)
    #             if len(res) == k:
    #                 return res          
    # return None

       #### quick select 
       histogram = Counter(nums) 
       def quickSelect(keys, k):  
          pivot = histogram[random.choice(keys)]
          left, right = [], []
          for key in keys:
              if histogram[key] >= pivot:
                  right.append(key)
              else:
                  left.append(key) 
          if k == len(right):
              return right
          elif k < len(right):
              return quickSelect(right, k)         
          else:
              return quickSelect(left, k - len(right)) + right
       return quickSelect(list(histogram.keys()), k)

Solution

  • quickSelect(keys, k, ri = []) does not explicitly return a value when it recurses - the return value will be None.

    You don't need ri: manipulate k.

    The same approach preventing a decent quicksort implementation from ever needing O(n) levels of recursion would work for quickSelect:
    Do not recurse on the larger partition.

    But check if then you need to recurse at all.