Search code examples
pythonprocessing-efficiencymemory-efficient

Efficiency of passing in variables versus redefining them for a function called many times


I have a question about efficiency, by which I'm interested in both processing speed and memory efficiency.

I'm writing code that calls a function many times. Of its arguments, some need to remain variables because the function is called with different values for those arguments each time. Others, however, are constant and are the same not only each time the function is called, but change neither each time the code is run, nor even between runs. However, I would still like to write them as variables and not hardcode them, so that I can refer to them by descriptive names.

In the following example, I expect that the first two are equally efficient, and that the last one is not as efficient. However, I don't have a clear picture of why. Could anyone help to explain that?

EDIT: In the example below, I only use 100 iterations. In reality, I would expect to be calling functions in this way thousands or millions of times.

EDIT 2: For those just telling me to profile it. note that I asked why, not just if, some ways are more efficient than others. Profiling would answer if but not why.

import numpy as np

def myfunc(a):
    (a**4) + np.sqrt(3)*a - a/3

for a in arange(100):
    y[a] = myfunc(a)

versus

import numpy as np
myb = 3

def myfunc(a):
    (a**4) + np.sqrt(myb)*a - a/myb

for a in arange(100):
    y[a] = myfunc(a)

versus

import numpy as np
myb = 3

def myfunc(a, b):
    (a**4) + np.sqrt(b)*a - a/b

for a in arange(100):
    y[a] = myfunc(a, myb)

Solution

  • Efficiency at the level you're talking about doesn't matter. You should worry about the readability of your code rather than the efficiency of moving a few values around. If the code is more readable to pass values to a function each time through a loop, even though they don't change, then pass them to the function. Putting things in globals, for example, is usually much less readable in terms of understanding what the code is doing.

    Here's an example I threw together:

    import random
    
    def foo(iter, a, b, c, d, e, f, g, h, i ,j):
        x = a + b + c + d + e + f + g + h + i + j
        if iter % 100000 == 0:
            print(x)
    
    for i in range(1000000):
        foo(i, random.random() * 100,
            random.random() * 100,
            random.random() * 100,
            random.random() * 100,
            random.random() * 100,
            random.random() * 100,
            random.random() * 100,
            random.random() * 100,
            random.random() * 100,
            random.random() * 100)
    

    Result:

    658.9874644541911
    643.4372986147371
    636.6218502753122
    475.3660640474451
    648.4789890659888
    466.2721794578193
    595.3755252194462
    583.45879143973
    498.04278700281304
    283.2047039562956
    

    This code does a million iterations of creating 10 random values, multiplying each by 100, passing them individually into a function, and summing them up in the function. Every 100,000 iterations, I print the sum value just as a bit of a sanity check.

    This runs in 2-3 seconds on my Macbook Pro. Our computers these days are really really fast and capable. So much so that it is almost never worth worrying about the kinds of optimizations you're talking about.

    UPDATE: To push the point further, and because I was curious, I tried taking out the random number generations and so running this:

    for i in range(1000000):
        foo(i, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    

    This runs basically instantaneously, printing 55 ten times in the blink of an eye. So most of the first example is the generation of 10M random numbers. I'll note further that with these constants involved, the compiler and processor are probably both optimizing up the wazoo since nothing changes in this case. But that only pushes the point further. There's no reason to worry about passing around constant values, partly because the compilers and processors these days will recognize and optimize out such patterns for you. Avoiding these optimizations is why I used random() for the first example.

    Memory is a different issue, but usually that's decided by the problem itself and not exactly how you do it. There are certainly occasions when memory becomes a particular issue where you need to be clever (do things in batches, process with streams, etc.). The memory question is where it would be nice to know what kinds of numbers we're talking about, and what the data looks like in general.