I'm a writing a C++ custom operator in MXNet and am having trouble finding documentation on when kAddTo
is set in an operator invocation. As a minimal example, let's say that my new operator is called foo()
and I want to perform the following calculation:
A = mx.sym.Variable('A')
B = mx.sym.Variable('B')
T = mx.sym.foo(A)
T += mx.sym.foo(B)
In general, how do I ensure that the fourth line above accumulates into T as opposed to creating a new temporary storage for the result of mx.sym.foo(B)
and then performing the T = T + temp
calculation?
(Using the Kernighan-Ritchie debugger, aka print statements, I found that kWriteTo
is set on both lines three and four. The enum kAddTo
is never set.)
A bit more detail concerning my specific problem: in my current implementation foo()
zeroes out the output memory before performing a calculation which populates it with the appropriate values. I definitely only want to perform this zeroing out when creating a new output location, not when accumulating into an existing one.
Update
Offline, a colleague suggested using
mx.sym.elemwise_add(lhs=T, rhs=mx.sym.foo(B), out=T)
in place of line 4, above. However, I still saw that kWriteTo
was being set in both lines of computation. I then received the following response:
“Memory planning and inplace operations are automatic. It will be done automatically. Users don’t need to worry about it.”, which probably means that
req[0]
is not an accurate indicator in this case. If you want to verify whether it’s an inplace addTo, you can print out the value ofoutputs[0].dptr_
andlhs.dptr_
to see whether they are equal.
I haven't checked this, yet.
Operator can not control in which mode it will be executed. The thing is, only graph optimizer knows the context in which the operator is used and can make a decision if the operator need to be executed in the kWriteTo
or kAddTo
. More precisely, this happens here in the method DetectInplaceAddTo .And even if in some cases it has been executed in kAddTo
this behavior might be changed in the future due to change in the logic that optimizes the computational graph.
“Memory planning and inplace operations are automatic. It will be done automatically. Users don’t need to worry about it.”
This means that operator can not control in which mode it is execute, however the operatro MUST strictly obey the mode that has been requested (kWriteTo
or kAddTo
). For example if the mode is kWriteTo
and the operator tries to add diff to the outputs, instead of overriding what is in it, this would lead to an unpredictable results since outputs might be populated with the garbage. On the other hand if the mode is kAddTo
however the operator does not support it it might be even worse, since, instead of adding the results to the outputs it will just override the outputs(cases like this usually very hard to debug). This leads, time to time, to bugs like this one.
So, in short:
In general, how do I ensure that the fourth line above accumulates into T as opposed to creating a new temporary storage for the result of mx.sym.foo(B) and then performing the T = T + temp calculation?
You can not, it's not the operator decision in which mode to be executed. Even if the configuration is using mode kAddTo
with future versions of the MXNet. Also in the future there might be possible to create new APIs to send a hint to a graph optimizer (or suggestion) to use particular mode. But I'm not aware of such development.
Now the question: "in which particular case MXNet 0.10/0.11 will use kAddTo
"?
This is tricky, by looking on the following code:
for (uint32_t nid = 0; nid < idx.num_nodes(); ++nid) {
const auto& inode = idx[nid];
if (inode.source->op() != ewise_plus_op) continue; // <= HERE
int sid = storage_id[idx.entry_id(inode.inputs[0])];
It looks like the kAddTo
used only during _grad_add
, which is sad. Also this might be a bug, since maybe instead of:
static const Op* ewise_plus_op = Op::Get("_grad_add");
Actual intention was:
static const Op* ewise_plus_op = Op::Get("elemwise_add");