Search code examples
pythonlistcontainersglobal-variablesheap

Find Median from Data Stream __init__ in Class issue


I have a question on defining the init in class. The original problem is:

Design a class to calculate the median of a number stream. The class should have the following two methods:

insertNum(int num): stores the number in the class findMedian(): returns the median of all numbers inserted in the class If the count of numbers inserted in the class is even, the median will be the average of the middle two numbers.

My original code is:

from heapq import *


class Solution:
    # define the min and max heaps
    def __init__(self):
        self.minHeap = []
        self.maxHeap = []

    def insertNum(self, num):
        # check if the first number into the heap
        # use -num to create the max heap
        if not self.maxHeap or num <= -self.maxHeap[0]:
            heappush(self.maxHeap, -num)
        else:
            heappush(self.minHeap, num)
        print(self.minHeap, self.maxHeap)
        # ensure l(maxHeap) = l(minHeap) or +1
        # 4 vs 2
        if len(self.maxHeap) > len(self.minHeap) + 1:
            heappush(self.minHeap, -heappop(self.maxHeap))
        # 1 vs 2
        elif len(self.maxHeap) < len(self.minHeap):
            heappush(self.maxHeap, -heappop(self.minHeap))
        print(self.minHeap, self.maxHeap)

    def findMedian(self):
        print(self.minHeap, self.maxHeap)
        if len(self.minHeap) == len(self.maxHeap):
            return 0.5 * (self.minHeap[0] - self.maxHeap[0])
        if len(self.maxHeap) == len(self.minHeap) + 1:
            return -self.maxHeap[0]


def main():
    Solution().insertNum(3)
    Solution().insertNum(1)
    print("The median is: " + str(Solution().findMedian()))
    Solution().insertNum(5)
    print("The median is: " + str(Solution().findMedian()))
    Solution().insertNum(4)
    print("The median is: " + str(Solution().findMedian()))


main()

And it will return the following output:

[] [-3]
[] [-3]
[] [-1]
[] [-1]
[] []
Traceback (most recent call last):
  File "/Users/tairanye/PycharmProjects/tester/main.py", line 45, in <module>
    main()
  File "/Users/tairanye/PycharmProjects/tester/main.py", line 38, in main
    print("The median is: " + str(Solution().findMedian()))
  File "/Users/tairanye/PycharmProjects/tester/main.py", line 30, in findMedian
    return 0.5 * (self.minHeap[0] - self.maxHeap[0])
IndexError: list index out of range

Process finished with exit code 1

I think the main issue here is that minHeap and maxHeap reset each time I call insertNumber. Once I updated the initial line of codes as following, the issue will go.

class Solution:
    # define the min and max heaps

    minHeap = []
    maxHeap = []

    def insertNum(self, num):

I am not sure the reason behind this. Thanks for your help.


Solution

  • Your class definition seems to be alright. But it is wrong with the usage of Solution() in your main function. Solution is a class type, every time when you call Solution(), it creates a new object/instance. Even though you have already run

    Solution().insertNum(3)
    Solution().insertNum(1)
    

    before

    print("The median is: " + str(Solution().findMedian()))
    

    However Solution() in the last line of code has no data stored. They are very different objects (or the latter has rewritten the previous objects). Because calling Solution() creates another object every time, which is obvious that the error message is complaing that there is no data in self.minHeap, thus invoking self.minHeap[0] would cause IndexError: list index out of range error. Hopefully, by far why this error message is clear to you.

    Now let's talk about why the following code would work fine.

    class Solution:
        # define the min and max heaps
    
        minHeap = []
        maxHeap = []
    
        def insertNum(self, num):
    

    Now you have defined

    minHeap = []
    maxHeap = []
    

    as class variable rather than instance variable (defined with self). Thus when you call Solution() every time,even a new instance has been created, but the data would be stored as class variable and they will live with class untill the class is destroyed.

    The following two lines of code stored the data into the class, rather than the instance created by Solution()

    Solution().insertNum(3)
    Solution().insertNum(1)
    

    That 's why later when you run

    print("The median is: " + str(Solution().findMedian()))
    

    there is no error, because a new instance created by Solution() has the data already when it was created.