I am trying to wrap my head around C-optimized code in python. I have read a couple of times now that python achieves high-speed computing through C-extensions. In other words, whenever I work with libraries such as numpy, it basically calls a C-extension that calculates the result and returns it.
Say I want to add two numbers using np.add(x,y)
. If I understand it correctly, libraries such as numpy do not compile the python code but instead already come with executables that will simply take the values x and y and return the result. Is that correct?
In particular, I am wondering if this is also true for deep learning libraries. According to the official documentation of Theano, it requires g++ and gcc (at least they are highly recommended). Does this mean that Theano will compile C (or C++) code at runtime of the python script? If so, is it the same for PyTorch and Tensorflow?
I hope that someone can solve my confusion here! Thanks a lot!
C extensions in python
numpy
uses C-extensions a lot. For instance, you can take a look at the C implementation of the sort() function [1] here [2].
[1] https://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html
[2] https://github.com/numpy/numpy/blob/master/numpy/core/src/npysort/quicksort.c.src
Deep learning libraries
Deep learning libraries use C-extensions for a large part of their backend, as well as CUDA and CUDNN. Code can be compiled at runtime:
[3] http://deeplearning.net/software/theano/extending/pipeline.html#compilation-of-the-computation-graph
[4] https://www.tensorflow.org/xla/jit
[5] https://pytorch.org/blog/the-road-to-1_0/#production--pain-for-researchers
To answer your question, theano
will compile C/C++ code at runtime of the python script. The graph compilation time at runtime is extremely slow for theano
: I advise you to focus on pytorch
or tensorflow
rather than theano
.
If you're new to deep learning, you may take a quick look at [6] too.