Numpy Einsum Path Differences and the Optimize Argument

I have the following tensor execution,

np.einsum('k,pjqk,yzjqk,yzk,ipqt->it', A, B, C, D, E)

And I noticed that when 'z' or 'q' expanded in dimension the execution time really suffered, although my intuition was that it probably shouldn't be that bad - maybe it was my input form that I could optimise by manual tensor contractions.

After a little digging I saw that optimize has two modes: 'optimal' and 'greedy'. If I evaluated my path against both modes I have, respectively:

(['einsum_path', (0, 3), (1, 3), (0, 2), (0, 1)],
'  Complete contraction:  k,pjqk,yzjqk,yzk,ipqt->it\n'
'         Naive scaling:  8\n'
'     Optimized scaling:  5\n'
'      Naive FLOP count:  5.530e+04\n'
'  Optimized FLOP count:  7.930e+02\n'
'   Theoretical speedup:  69.730\n'
'  Largest intermediate:  2.400e+01 elements\n'
'--------------------------------------------------------------------------\n'
'scaling                  current                                remaining\n'
'--------------------------------------------------------------------------\n'
'   3                  yzk,k->yzk                  pjqk,yzjqk,ipqt,yzk->it\n'
'   5              yzk,yzjqk->jqk                        pjqk,ipqt,jqk->it\n'
'   4                jqk,pjqk->qp                              ipqt,qp->it\n'
'   4                 qp,ipqt->it                                   it->it')

and

(['einsum_path', (2, 3), (1, 3), (1, 2), (0, 1)],
'  Complete contraction:  k,pjqk,yzjqk,yzk,ipqt->it\n'
'         Naive scaling:  8\n'
'     Optimized scaling:  5\n'
'      Naive FLOP count:  5.530e+04\n'
'  Optimized FLOP count:  1.729e+03\n'
'   Theoretical speedup:  31.981\n'
'  Largest intermediate:  4.800e+01 elements\n'
'--------------------------------------------------------------------------\n'
'scaling                  current                                remaining\n'
'--------------------------------------------------------------------------\n'
'   5              yzk,yzjqk->jqk                      k,pjqk,ipqt,jqk->it\n'
'   4               jqk,pjqk->qkp                           k,ipqt,qkp->it\n'
'   5               qkp,ipqt->tik                                k,tik->it\n'
'   3                   tik,k->it                                   it->it')

The tested results are that 'optimal' is far faster for me, as indicated.

Question

Can anyone explain in simple terms what the difference is and why 'greedy' is set to default?

What is the downside to always using 'optimal'?

If my einsum calculation is going to be run 1000s of times (which it is as part of an optimisation iteration) should I restructure the execution to automatically benefit from the 'optimal' path without having to re-calculate it (or a 'greedy' path) everytime?

Solution

Little bit more reading revealed the following for anyone that finds this:

'greedy' is generally quite performant yielding 'optimal' solution in most used cases and is faster to perform. For the general user who may inadvertently use einsum in iterative loops it will probably suffice to leave 'greedy' as default.

Otherwise for one-off calculations it seems that the minimal additional overhead for 'optimal' means it can effectively be employed, except, perhaps, for large numbers of indices, and it may provide a big boost (as in my case).

In loops the best thing to do is pre-calculate it (or calculate in the first iteration and update a nonlocal variable) and supply it as an argument:

path, display = np.einsum_path('k,pjqk,yzjqk,yzk,ipqt->it', A, B, C, D, E, optimize='optimal')
for i in range(BIG_INT):
    # other things
    calculation = np.einsum_path('k,pjqk,yzjqk,yzk,ipqt->it', A, B, C, D, E, optimize=path)
    # more things