Search code examples
pythonperformancegeneratormemory-efficient

Is max() of a generator building a list-like object or is it working more efficient?


Let's say I have directory in which I have filenames with the names 'filename_1', 'filename_2', etc. and have a generator models_paths which I use to find the newest number:

mypath = 'my/path/filename'
models_paths = Path(mypath).parent.glob(Path(mypath).name + '*')
number_newest = max(int(str(file_path).split('_')[-1]) for file_path in models_paths)

I would like to know whether max is building a list-like data structure or whether it is using an algorithm like

number_newest = None
for file_path in models_paths:
    number_current = int(str(file_path).split('_')[-1])
    number_newest = number_current if number_newest is None else max(number_current, number_newest)

In other words: Am I losing processing efficiency and/or memory efficiency if I write

mypath = 'my/path/filename'
models_paths = Path(mypath).parent.glob(Path(mypath).name + '*')
models_paths = list(models_paths)
number_newest = max(int(str(file_path).split('_')[-1]) for file_path in models_paths)

?


Solution

  • max does not build a list.

    This can be demonstrated clearly in this example with a custom object:

    class Thing:
        
        def __init__(self, x):
            self.x = x
            print(f'creating {x}')
            
        def __lt__(self, other):
            return self.x < other.x
    
        def __del__(self):
            print(f'destroying {self.x}')
    
        def __str__(self):
            return f'<{self.x}>'
            
    
    print(max(Thing(i) for i in range(5)))
    

    which gives:

    creating 0
    creating 1
    destroying 0
    creating 2
    destroying 1
    creating 3
    destroying 2
    creating 4
    destroying 3
    <4>
    destroying 4
    

    As you can see, the __del__ method is being called on each of the objects as soon as it is decided that it is no longer the object with the maximum value. This would not be the case if they were being appended to a list.

    Contrast with:

    print(max([Thing(i) for i in range(5)]))
    

    which gives:

    creating 0
    creating 1
    creating 2
    creating 3
    creating 4
    destroying 3
    destroying 2
    destroying 1
    destroying 0
    <4>
    destroying 4
    

    You can write a (less efficient) equivalent function and show that it does the same thing:

    def mymax(things):
        empty = True
        for thing in things:
            if empty or (thing > maximum):  # parentheses for clarity only
                maximum = thing
                empty = False
        if empty:
            raise ValueError
        return maximum