Additional information on numpy.einsum()

I am trying to understand numpy.einsum() function but the documentation as well as this answer from stackoverflow still leave me with some questions.

Let's take the einstein sum and the matrices defined in the answer.

A = np.array([0, 1, 2])

B = np.array([[ 0,  1,  2,  3],
                  [ 4,  5,  6,  7],
                  [ 8,  9, 10, 11]])

np.einsum('i,ij->i', A, B)

So, based on my understanding of einstein sum, I would translate this function to be equivalent to the notation (A_i*B_ij) so I would obtain :

j = 1 : A_1*B_11 + A_2*B_21 + A_3*B_31

j = 2 : A_1*B_12 + A_2*B_22+ A_3*B_32

and so on until j = 4. This gives

j = 1 : 0 + 4 + 16

j = 2 : 0 + 5 + 18

which would be the einstein summation according to my understanding. Instead, the function does not perform the overall sum but stores the separate term in a matrix where we can spot the results of the (A_i * B_ij)

0   0   0   0
4   5   6   7
16  18  20  22

How is this actually controlled by the function ? I feel this is controlled by the output labels as mentionned in the documentation :

The output can be controlled by specifying output subscript labels as well. This specifies the label order, and allows summing to be disallowed or forced when desired

so somehow I assume that putting ->i disables summing of the inner sums. But how does this work exactly ? This is not clear for me. Putting ->j provides the actual einstein sum as expected.

Solution

It seems your understanding of the Einstein summation is not correct. The subscript operations you've written out have the multiplication correct, but the summation is over the wrong axis.

Think about what this means: np.einsum('i,ij->i', A, B).

A has shape (i,) and B has shape (i, j).
Multiply every column of B by A.
Sum over the second axis of B, i.e., over the axis labeled j.

This gives an output of shape (i,) == (3,), while your subscript notation gives an output of shape (j,) == (4,). You're summing over the wrong axis.

More detail:

Remember that the multiplication always happens first. The left-hand subscripts tell the np.einsum function which rows/columns/etc of the input arrays are to be multiplied with one another. The output of this step always has the same shape as the highest-dimensional input array. I.e., at this point, a hypothetical "intermediate" array has shape (3, 4) == B.shape.

After multiplication, there is summation. This is controlled by which subscripts are omitted from the right-hand side. In this case, j is omitted, which means to sum along the first axis of the array. (You're summing along the zeroth.)

If you instead wrote: np.einsum('i,ij->ij', A, B), there would be no summation, as no subscripts are omitted. Thus you'd get the array you've got at the end of your question.

Here are a couple of examples:

Ex 1:

No omitted subscripts, so no summation. Just multiply columns of B by A. This is the last array you've written out.

>>> (np.einsum('i,ij->ij', A, B) == (A[:, None] * B)).all()
True

Ex 2:

Same as the example. Multiply columns, then sum across the output's columns.

>>> (np.einsum('i,ij->i', A, B) == (A[:, None] * B).sum(axis=-1)).all()
True

Ex 3:

The sum as you've written it above. Multiply columns, then sum across the output's rows.

>>> (np.einsum('i,ij->j', A, B) == (A[:, None] * B).sum(axis=0)).all()
True

Ex 4:

Note that we can omit all axes at the end, to just get the total sum across the whole array.

>>> np.einsum('i,ij->', A, B)
98

Ex 5:

Note that the summation really happens because we repeated the input label 'i'. If we instead use different labels for each axis of the input arrays, we can compute things similar to Kronecker products:

>>> np.einsum('i,jk', A, B).shape
(3, 3, 4)

EDIT

The NumPy implementation of the Einstein sum differs a bit from the traditional definition. Technically, the Einstein sum doesn't have the idea of "output labels". Those are always implied by the repeated input labels.

From the docs: "Whenever a label is repeated, it is summed." So, traditionally, we'd write something like np.einsum('i,ij', A, B). This is equivalent to np.einsum('i,ij->j', A, B). The i is repeated, so it is summed, leaving only the axis labeled j. You can think about the sum in which we specify no output labels as being the same as specifying only the labels that are not repeated in the input. That is, the label 'i,ij' is the same as 'i,ij->j'.

The output labels are an extension or augmentation implemented in NumPy, which allow the caller to force summation or to enforce no summation on an axis. From the docs: "The output can be controlled by specifying output subscript labels as well. This specifies the label order, and allows summing to be disallowed or forced when desired."