Get computational graph / expression of symbolic matrix differentiation

I want to write some custom CUDA kernels for neural networks to speed up computations, but I don't want to spend time differentiating tensor expressions by hand if there are packages that can do it automatically.

Is there a python package that can show expression for symbolic matrix differentiation?

I know sympy can do it for non-matrix expressions like this:

def func(x):
    return 1 / x

arg_symbols = sp.symbols(inspect.getfullargspec(func).args)
sym_func = func(*arg_symbols)
s = ''
for arg in arg_symbols:
    s += '{}\n'.format(arg, sp.Lambda(arg_symbols, sym_func.diff(arg)))
# this is what I need:
print(s)
>>> Lambda(x, -1/x**2)

I know autograd package can compute the derivatives of matrix expressions

After the function is evaluated, autograd has a list of all operations that were performed and which nodes they depended on. This is the computational graph of the function evaluation. To compute the derivative, we simply apply the rules of differentiation to each node in the graph.

But is there a way to get this differentiation computational graph from it or some similar package?

Solution

There are some severe differences between the packages you cited. The differences are the reason why you cannot get (AFAIK) directly the computational graph from an automatic differentiation library, but you can get it from a symbolic based one.

In short:

numerical differentiation: numpy is enough
symbolic differentiation: sympy
automatic differentiation: autograd is one example

There are three kind of differentiations methods:

numerical differentiation: this evaluates Delta(f(x)) / Delta(x) where Delta(x) is a small difference that represents variation of x while remaining in the domain of f. This is not what you need. You don't need a package for this.
symbolic differentiation: is based upon the construction of a graph that represents the symbolic applications of function (I have an article on the implementation of symbolic engine in Ruby here). In this case the differentiation is performed through a recursive application of the chain derivation rule:
```
f(g(x))' = f'(g(x)) * g'(x)
```
When this rule is applied to the whole symbolic graph, what results is a newer symbolic graph with the derivative. The advantage lies in the fact that the derivative is exact, but for very complex graph, the final derivative graph may be non-tractable (exceeds memory limits or stack limits for too deep recursion). In python sympy implements this kind of derivation. On the other side, if you have a graph of the derivative you can perform operations on it, such as simplifications or substitutions.
```
from sympy import *
import numpy as np

x = symbol('x')
f = 1 / x
df = diff(f, x)
print(df)
# -1/x**2

ldf = lambdify((x), df)

# Now ldf is a lambda
x_ary = np.array([
  [[1, 2, 3], [1, 2, 3]], 
  [[1, 2, 3], [1, 2, 3]]
])
y_ary = ldf(x_ary)

print(xn.shape)
# (2, 2, 3)
print(y_ary)
# array([[[-1.        , -0.25      , -0.11111111],
#         [-1.        , -0.25      , -0.11111111]],
#        [[-1.        , -0.25      , -0.11111111],
#         [-1.        , -0.25      , -0.11111111]]])
```
as you can see it works with numpy, but it covers some of the basic examples not everything, and in fact sympy.matrix alongside with sympy.symbol should be used for particular graph (for example: I don't think it can handle diff(x.T A x, x) = x.T A + A x) directly).

It is possible also to export the graph as C code, but it has some limited capabilities and for your application you must certainly modify the result:
```
from scipy.utilities.codegen import codegen

[(cf, cc), (hf, hc)] = codegen(("df", df), "C", "df")

print(hc, cc)
```
Prints out:

/*****************************************************
 *      Code generated with sympy 1.1.1              *
 *  See http://www.sympy.org/ for more information.  *
 *      This file is part of 'project'               *
 *****************************************************/

#ifndef PROJECT__DIFF__H
#define PROJECT__DIFF__H

double df(double x);

#endif

/*****************************************************
 *      Code generated with sympy 1.1.1              *
 *  See http://www.sympy.org/ for more information.  *
 *      This file is part of 'project'               *
 *****************************************************/
#include "diff.h"
#include <math.h>

double df(double x) {   
  double df_result;
  df_result = -1/pow(x, 2);
  return df_result;   
}

automatic differentiation is what is done through autograd. In this case the best of both world is united. From one side it is not necessary to explicitly evaluate the graph, on the other side you cannot perform further operation on the derived function, while keeping the derivative exact. This is done (usually) by augmenting the float definition with an additional field (something like float[2] or more), where the additional field contains the derivative. For example in an automatic differentiation environment, the sin function may be overloaded with:
```
def sin(x):
     return [sin(x[0]), x[1] * cos(x[0])]
```
but as you can understand in this way, no computational graph is available, but you get directly the value of the exact derivative along x (all functions must be overloaded). I have an example more complete (in C lang, using only macros) here. Notice that internally Tensorflow uses automatic differentiation instead of symbolic one but suggests user to directly provide an "explicit version" that handles numerical instabilities!. Automatic differentiation does not handles numerical instability usually.