Search code examples
python-3.xalgorithmlist

How can I find a number in a list (smaller than a given bound but the biggest one)?


Assume we are using Python:
An int number a , and a list of int numbers are given. We are supposed to find this very number:
It should be smaller than a;
It is the biggest one in all the numbers that are smaller than a;

for ex. a=3, list=[1,2,3,4], and the very number should be 2.

How should I achieve this without any outside packages?


Solution

  • As a comparison of different solutions, all of the answers are asymptomatically O(n). Here I compared three of them in a jupyternotebook using magic functions %memit and %timeit:

    import timeit
    import random
    %load_ext memory_profiler
    l = list(range(100000)) # Create a long list
    random.shuffle(l)       # Let's shuffle the list
    a = 50000               # Set the target variable the middle element
    

    Here is a function that does what the question wants:

    def max_less_than(a,l):
        c = None
        for i in l:
            if i < a:
                c = max(c,i) if c else i
        return c
    

    Here are two solutions based on list comprehension and filter:

    result = max([x for x in l if x < a]) 
    result = max(filter(lambda x: x < a, l))
    

    Let's profile them with timeit; I ran each of them in four rounds each round with 1000 iterations, you may increase the rounds/iterations for more accurate results:

    %timeit -n 1000 -r 4 max_less_than(a,l)
    # 13.5 ms ± 588 µs per loop (mean ± std. dev. of 4 runs, 1000 loops each)
    %timeit -n 1000 -r 4 max(filter(lambda x: x < a, l))
    # 15.7 ms ± 248 µs per loop (mean ± std. dev. of 4 runs, 1000 loops each)
    %timeit -n 1000 -r 4 max([x for x in l if x < a])
    # 8.07 ms ± 301 µs per loop (mean ± std. dev. of 4 runs, 1000 loops each)
    

    Seems that list comprehension beats the other two in CPU performance. But let's check the memory as well:

    %memit max_less_than(a,l)
    # peak memory: 53.86 MiB, increment: 0.13 MiB
    %memit max(filter(lambda x: x < a, l))
    # peak memory: 53.75 MiB, increment: 0.12 MiB
    %memit max([x for x in l if x < a])
    # peak memory: 54.23 MiB, increment: 0.48 MiB
    

    The reason that filter-based approach uses significantly less memory is its generator-like nature. In the sense that the elements of a filter object are not going to be stored in memory, but they are evaluated one-by-one when some operation is taking place on the object. So in summary, if your main concern is memory efficiency (like the cases that your solution requires creating long sequences over and over again) you might be better to consider generator-based built-ins.