Python find min and max in realtime stream without full array

I have a continuous stream of values coming in, many millions of records. I need to find the minimum and maximum in that has arrived so far in real-time as numbers keep coming in. The whole data array is not available. Data arrived is not stored. Min max range is also unknown.

I tried something like this, but it isn't working perfectly. Is there a better way to solve these using libraries, numpy, scipy?

import numpy as np
rng = np.random.default_rng()

test = rng.choice(np.arange(-100,100, dtype=int), 10, replace=False)
testmax = 0
testmin = 0
for i in test: #simulates a stream
    if i < testmax:
        testmin = i
    if i > testmax:
        testmax = i
    if i < testmin:
        testmin = i


print (test, 'min: ',testmin, 'max: ', testmax)


>>> print (test, 'min: ',testmin, 'max: ', testmax)
[ 39 -32  61 -18 -53 -57 -69  98 -88 -47] min:  -47 max:  98 #should be -88 and 98
>>>
>>> print (test, 'min: ',testmin, 'max: ', testmax)
[-65 -53   1   2  26 -62  82  70  39 -44] min:  -44 max:  82 #should be -65 and 82
>>>

Solution

The mistake (typo) was pointed out in the comments, but you only need two comparisons--this can be done using the ternary operator. You should also initialize the max to be negative infinity and the min to be positive infinity. This helps avoid cases where, for example, you set min to 0 but the smallest number actually seen in the stream is greater than 0.

import numpy as np

rng = np.random.default_rng(42)

stream_min = -100
stream_max = 100
test = rng.choice(np.arange(stream_min, stream_max+1, dtype=int),
                  10,
                  replace=False)

testmax = -float("inf")
testmin = float("inf")

# simulates a stream
for i in test:
    testmax = i if i > testmax else testmax
    testmin = i if i < testmin else testmin


print (test, "min: ", testmin, "max: ", testmax)
# [ 97  49 -83  26 -15 -16  38 -82 -60  69] min:  -83 max:  97

Why the ternary operator over using min/max? Well, it's faster.

stream_min = -1000
stream_max = 1000
test = rng.choice(np.arange(stream_min, stream_max+1, dtype=int),
                  500,
                  replace=False)

def ternary():
    testmax = -float("inf")
    testmin = float("inf")

    for i in test:
        testmax = i if i > testmax else testmax
        testmin = i if i < testmin else testmin

    return testmin, testmax

def plainif():
    testmax = -float("inf")
    testmin = float("inf")

    for i in test:
        if i > testmax:
            testmax = i
        if i < testmin:
            testmin = i

    return testmin, testmax

def minmax():
    testmax = -float("inf")
    testmin = float("inf")

    for i in test:
        testmax = max(i, testmax)
        testmin = min(i, testmax)

    return testmin, testmax

%timeit ternary() 
55.4 µs ± 3.26 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit plainif()
50.6 µs ± 2.23 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit minmax()
170 µs ± 6.01 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Using an if statement vs the ternary operator is nearly equivalent (the if is a hair faster).