Search code examples
pythonnumpyscientific-computingblas

multithreaded blas in python/numpy


I am trying to implement a large number of matrix-matrix multiplications in Python. Initially, I assumed that NumPy would use automatically my threaded BLAS libraries since I built it against those libraries. However, when I look at top or something else it seems like the code does not use threading at all.

Any ideas what is wrong or what I can do to easily use BLAS performance?


Solution

  • Not all of NumPy uses BLAS, only some functions -- specifically dot(), vdot(), and innerproduct() and several functions from the numpy.linalg module. Also note that many NumPy operations are limited by memory bandwidth for large arrays, so an optimised implementation is unlikely to give any improvement. Whether multi-threading can give better performance if you are limited by memory bandwidth heavily depends on your hardware.