Search code examples
pythonnumpyoptimizationrefactoringnumexpr

numexpr: temporary variables or repeated sub-expressions?


If the same sub-expression appears in multiple places within one numexpr expression, will it be recalculated multiple times (or is numexpr clever enough to detect this and reuse the result)?

Is there any way to declare temporary variables within a numexpr expression? This would have two aims:

  1. encourage numexpr to consider caching and re-using, rather than re-calculating, the result;
  2. simplify the expression (making the source code easier to read and maintain).

I am trying to calculate f(g(x)) where f and g are themselves both complicated expressions (e.g. for pixel-based thematic classification, f is a nested decision tree involving multiple thresholds, g is a set of normalised difference ratios, and x is a multi-band raster image).


Solution

  • Yes, if a sub-expression is repeated within a numexpr expression, it will not be recalculated.

    This can be verified by replacing numexpr.evaluate(expr) with numexpr.disassemble(numexpr.NumExpr(expr)).

    For example, the expression "where(x**2 > 0.5, 0, x**2 + 10)" is compiled into something like:

    y = x*x
    t = y>0.5
    y = y+10
    y[t] = 0
    

    (Note the multiplication only appears once, not twice.)

    For this reason, it is best if the entire computation can be input as a single numexpression. Avoid performing sub-calculations in python (assigning intermediate results or temporary variables into numpy arrays), as this will only increase memory usage and undermine numexpr's optimisations/speedups (which relate to performing this full sequence of computations in CPU-cache sized chunks to evade memory latency).

    Nonetheless, more readable code can be formatted by using string substitution:

    f = """where({g} > 0.5,
                 0,
                 {g} + 10)"""
    g = "x**2"
    expr = f.format(g=g)