Search code examples
pythonoptimizationtensorflowgradient-descent

Finding implementation of methods in Tensorflow


I want to change a bit minimization optimizers like AdadeltaOptimizer, which is used in Tensorflow. I got the license but there is no code in lib, only reference, so how can I find the implementation? Here is Adadelta example of API:

@tf_export("train.AdadeltaOptimizer") class 
AdadeltaOptimizer(optimizer.Optimizer) 
Optimizer that implements the Adadelta algorithm.
See [M. D. Zeiler](http://arxiv.org/abs/1212.5701) ([pdf]
(http://arxiv.org/pdf/1212.5701v1.pdf))

Solution

  • The first entry point is python/training/adadelta.py from tensorflow main repo. But you may notice it's a python wrapper, all ops are actually implemented in native C++ and loaded in python (this is the usual practice in tensorflow, see for instance this question: Where is the code for gradient descent?).

    For example, in core/kernels/training_ops.cc you can find CPU impelmentation of ApplyAdadelta op. GPU implementation of the same op is in core/kernels/training_ops_gpu.cu.cc:

    template <typename T>
    struct ApplyAdadelta<GPUDevice, T> {
      void operator()(const GPUDevice& d, typename TTypes<T>::Flat var,
                      typename TTypes<T>::Flat accum,
                      typename TTypes<T>::Flat accum_update,
                      typename TTypes<T>::ConstScalar lr,
                      typename TTypes<T>::ConstScalar rho,
                      typename TTypes<T>::ConstScalar epsilon,
                      typename TTypes<T>::ConstFlat grad) {
        Eigen::array<typename TTypes<T>::Tensor::Index, 1> bcast;
        bcast[0] = grad.dimension(0);
        Eigen::Sizes<1> single;
    
        accum.device(d) = accum * rho.reshape(single).broadcast(bcast) +
                          grad.square() * (grad.constant(T(1)) -
                                           rho.reshape(single).broadcast(bcast));
        const auto update =
            (accum_update + epsilon.reshape(single).broadcast(bcast)).sqrt() *
            (accum + epsilon.reshape(single).broadcast(bcast)).rsqrt() * grad;
        var.device(d) -= update * lr.reshape(single).broadcast(bcast);
        accum_update.device(d) =
            accum_update * rho.reshape(single).broadcast(bcast) +
            update.square() *
                (grad.constant(T(1)) - rho.reshape(single).broadcast(bcast));
      }
    };
    

    If you'd like to patch C++ code, you'll have to rebuild .so library. To be able to run your new optimizer on both CPU and GPU, you'll have to touch and rebuild both.